A list of researchers in the area of speech ordered by the number of relevant publications,
for the purpose of identifying potential academic supervisors.
See here for details.
Generated at 2023-07-25 22:54:07, arguments: --year_start 2016 --year_end 2023 --author_start_year 1900 --exclude_venue SSW,ASRU,IWSLT,SLT --n_pubs 20 --rank_start 0 --rank_end 200 --output speech_rankings.html
SpeechComm2022
Kun Zhou, Berrak Sisman, Rui Liu 0008, Haizhou Li 0001, 
Emotional voice conversion: Theory, databases and ESD.
SpeechComm2022
Hongning Zhu, Kong Aik Lee, Haizhou Li 0001, 
Discriminative speaker embedding with serialized multi-layer multi-head attention.
TASLP2022
Chitralekha Gupta, Haizhou Li 0001, Masataka Goto, 
Deep Learning Approaches in Topics of Singing Information Processing.
TASLP2022
Zexu Pan, Meng Ge, Haizhou Li 0001, 
USEV: Universal Speaker Extraction With Visual Cue.
ICASSP2022
Marvin Borsdorf, Kevin Scheck, Haizhou Li 0001, Tanja Schultz, 
Experts Versus All-Rounders: Target Language Extraction for Multiple Target Languages.
ICASSP2022
Xiaoxue Gao, Chitralekha Gupta, Haizhou Li 0001, 
Genre-Conditioned Acoustic Models for Automatic Lyrics Transcription of Polyphonic Music.
ICASSP2022
Meng Ge, Chenglin Xu, Longbiao Wang, Eng Siong Chng, Jianwu Dang 0001, Haizhou Li 0001, 
L-SpEx: Localized Target Speaker Extraction.
ICASSP2022
Tianchi Liu 0004, Rohan Kumar Das, Kong Aik Lee, Haizhou Li 0001, 
MFA: TDNN with Multi-Scale Frequency-Channel Attention for Text-Independent Speaker Verification with Short Utterances.
ICASSP2022
Junchen Lu, Berrak Sisman, Rui Liu 0008, Mingyang Zhang 0003, Haizhou Li 0001, 
Visualtts: TTS with Accurate Lip-Speech Synchronization for Automatic Voice Over.
ICASSP2022
Ruijie Tao, Kong Aik Lee, Rohan Kumar Das, Ville Hautamäki, Haizhou Li 0001, 
Self-Supervised Speaker Recognition with Loss-Gated Learning.
ICASSP2022
Qiquan Zhang, Qi Song, Zhaoheng Ni, Aaron Nicolson, Haizhou Li 0001, 
Time-Frequency Attention for Monaural Speech Enhancement.
ICASSP2022
Jinming Zhao, Ruichen Li, Qin Jin, Xinchao Wang, Haizhou Li 0001, 
Memobert: Pre-Training Model with Prompt-Based Learning for Multimodal Emotion Recognition.
Interspeech2022
Rui Liu 0008, Berrak Sisman, Björn W. Schuller, Guanglai Gao, Haizhou Li 0001, 
Accurate Emotion Strength Assessment for Seen and Unseen Speech Based on Data-Driven Deep Learning.
Interspeech2022
Junyi Ao, Ziqiang Zhang, Long Zhou, Shujie Liu 0001, Haizhou Li 0001, Tom Ko, Lirong Dai 0001, Jinyu Li 0001, Yao Qian, Furu Wei, 
Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data.
Interspeech2022
Marvin Borsdorf, Kevin Scheck, Haizhou Li 0001, Tanja Schultz, 
Blind Language Separation: Disentangling Multilingual Cocktail Party Voices by Language.
Interspeech2022
Zongyang Du, Berrak Sisman, Kun Zhou, Haizhou Li 0001, 
Disentanglement of Emotional Style and Speaker Identity for Expressive Voice Conversion.
Interspeech2022
Zexu Pan, Meng Ge, Haizhou Li 0001, 
A Hybrid Continuity Loss to Reduce Over-Suppression for Time-domain Target Speaker Extraction.
Interspeech2022
Zeyang Song, Qi Liu, Qu Yang, Haizhou Li 0001, 
Knowledge distillation for In-memory keyword spotting model.
Interspeech2022
Rui Wang, Qibing Bai, Junyi Ao, Long Zhou, Zhixiang Xiong, Zhihua Wei, Yu Zhang 0006, Tom Ko, Haizhou Li 0001, 
LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT.
Interspeech2022
Qu Yang, Qi Liu, Haizhou Li 0001, 
Deep residual spiking neural network for keyword spotting in low-resource settings.
TASLP2022
Shota Horiguchi, Yusuke Fujita, Shinji Watanabe 0001, Yawen Xue, Paola García, 
Encoder-Decoder Based Attractors for End-to-End Neural Diarization.
ICASSP2022
Siddhant Arora, Siddharth Dalmia, Pavel Denisov, Xuankai Chang, Yushi Ueda, Yifan Peng, Yuekai Zhang, Sujay Kumar, Karthik Ganesan 0003, Brian Yan, Ngoc Thang Vu, Alan W. Black, Shinji Watanabe 0001, 
ESPnet-SLU: Advancing Spoken Language Understanding Through ESPnet.
ICASSP2022
Xuankai Chang, Niko Moritz, Takaaki Hori, Shinji Watanabe 0001, Jonathan Le Roux, 
Extended Graph Temporal Classification for Multi-Speaker End-to-End ASR.
ICASSP2022
Hang Chen, Hengshun Zhou, Jun Du, Chin-Hui Lee, Jingdong Chen, Shinji Watanabe 0001, Sabato Marco Siniscalchi, Odette Scharenborg, Diyuan Liu, Bao-Cai Yin, Jia Pan, Jianqing Gao, Cong Liu 0006, 
The First Multimodal Information Based Speech Processing (Misp) Challenge: Data, Tasks, Baselines And Results.
ICASSP2022
Keqi Deng, Zehui Yang, Shinji Watanabe 0001, Yosuke Higuchi, Gaofeng Cheng, Pengyuan Zhang, 
Improving Non-Autoregressive End-to-End Speech Recognition with Pre-Trained Acoustic and Language Models.
ICASSP2022
Zili Huang, Shinji Watanabe 0001, Shu-Wen Yang, Paola García, Sanjeev Khudanpur, 
Investigating Self-Supervised Learning for Speech Enhancement and Separation.
ICASSP2022
Wen-Chin Huang, Shu-Wen Yang, Tomoki Hayashi, Hung-Yi Lee, Shinji Watanabe 0001, Tomoki Toda, 
S3PRL-VC: Open-Source Voice Conversion Framework with Self-Supervised Speech Representations.
ICASSP2022
Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Shinji Watanabe 0001, 
Integrating Multiple ASR Systems into NLP Backend with Attention Fusion.
ICASSP2022
Takashi Maekaku, Xuankai Chang, Yuya Fujita, Shinji Watanabe 0001, 
An Exploration of Hubert with Large Number of Cluster Units and Model Assessment Using Bayesian Information Criterion.
ICASSP2022
Niko Moritz, Takaaki Hori, Shinji Watanabe 0001, Jonathan Le Roux, 
Sequence Transduction with Graph-Based Supervision.
ICASSP2022
Motoi Omachi, Yuya Fujita, Shinji Watanabe 0001, Tianzi Wang, 
Non-Autoregressive End-To-End Automatic Speech Recognition Incorporating Downstream Natural Language Processing.
ICASSP2022
Jing Pan, Tao Lei 0001, Kwangyoun Kim, Kyu J. Han, Shinji Watanabe 0001, 
SRU++: Pioneering Fast Recurrence with Attention for Speech Recognition.
ICASSP2022
Brian Yan, Chunlei Zhang, Meng Yu 0003, Shi-Xiong Zhang, Siddharth Dalmia, Dan Berrebbi, Chao Weng, Shinji Watanabe 0001, Dong Yu 0001, 
Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization.
ICASSP2022
Yao-Yuan Yang, Moto Hira, Zhaoheng Ni, Artyom Astafurov, Caroline Chen, Christian Puhrsch, David Pollack, Dmitriy Genzel, Donny Greenberg, Edward Z. Yang, Jason Lian, Jeff Hwang, Ji Chen, Peter Goldsborough, Sean Narenthiran, Shinji Watanabe 0001, Soumith Chintala, Vincent Quenneville-Bélair, 
Torchaudio: Building Blocks for Audio and Speech Processing.
Interspeech2022
Siddhant Arora, Siddharth Dalmia, Xuankai Chang, Brian Yan, Alan W. Black, Shinji Watanabe 0001, 
Two-Pass Low Latency End-to-End Spoken Language Understanding.
Interspeech2022
Dan Berrebbi, Jiatong Shi, Brian Yan, Osbel López-Francisco, Jonathan D. Amith, Shinji Watanabe 0001, 
Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation.
Interspeech2022
Xuankai Chang, Takashi Maekaku, Yuya Fujita, Shinji Watanabe 0001, 
End-to-End Integration of Speech Recognition, Speech Enhancement, and Self-Supervised Learning Representation.
Interspeech2022
Hang Chen, Jun Du, Yusheng Dai, Chin-Hui Lee, Sabato Marco Siniscalchi, Shinji Watanabe 0001, Odette Scharenborg, Jingdong Chen, Baocai Yin, Jia Pan, 
Audio-Visual Speech Recognition in MISP2021 Challenge: Dataset Release and Deep Analysis.
Interspeech2022
Keqi Deng, Shinji Watanabe 0001, Jiatong Shi, Siddhant Arora, 
Blockwise Streaming Transformer for Spoken Language Understanding and Simultaneous Speech Translation.
Interspeech2022
Shuai Guo, Jiatong Shi, Tao Qian, Shinji Watanabe 0001, Qin Jin, 
SingAug: Data Augmentation for Singing Voice Synthesis with Cycle-consistent Training Strategy.
TASLP2022
Mengzhe Geng, Xurong Xie, Zi Ye, Tianzi Wang, Guinan Li, Shujie Hu, Xunying Liu, Helen Meng, 
Speaker Adaptation Using Spectro-Temporal Deep Features for Dysarthric and Elderly Speech Recognition.
TASLP2022
Shoukang Hu, Xurong Xie, Mingyu Cui, Jiajun Deng, Shansong Liu, Jianwei Yu, Mengzhe Geng, Xunying Liu, Helen Meng, 
Neural Architecture Search for LF-MMI Trained Time Delay Neural Networks.
TASLP2022
Haibin Wu, Xu Li, Andy T. Liu, Zhiyong Wu 0001, Helen Meng, Hung-Yi Lee, 
Improving the Adversarial Robustness for Speaker Verification by Self-Supervised Learning.
TASLP2022
Boyang Xue, Shoukang Hu, Junhao Xu, Mengzhe Geng, Xunying Liu, Helen Meng, 
Bayesian Neural Network Language Modeling for Speech Recognition.
ICASSP2022
Xueyuan Chen, Changhe Song, Yixuan Zhou, Zhiyong Wu 0001, Changbin Chen, Zhongqin Wu, Helen Meng, 
A Character-Level Span-Based Model for Mandarin Prosodic Structure Prediction.
ICASSP2022
Wenlin Dai, Changhe Song, Xiang Li 0067, Zhiyong Wu 0003, Huashan Pan, Xiulin Li, Helen Meng, 
An End-to-End Chinese Text Normalization Model Based on Rule-Guided Flat-Lattice Transformer.
ICASSP2022
Shun Lei, Yixuan Zhou, Liyang Chen, Zhiyong Wu 0001, Shiyin Kang, Helen Meng, 
Towards Expressive Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis.
ICASSP2022
Jingbei Li, Yi Meng, Chenyi Li, Zhiyong Wu 0001, Helen Meng, Chao Weng, Dan Su 0002, 
Enhancing Speaking Styles in Conversational Text-to-Speech Synthesis with Graph-Based Multi-Modal Context Modeling.
ICASSP2022
Jingbei Li, Yi Meng, Zhiyong Wu 0001, Helen Meng, Qiao Tian, Yuping Wang, Yuxuan Wang 0002, 
Neufa: Neural Network Based End-to-End Forced Alignment with Bidirectional Attention Mechanism.
ICASSP2022
Guinan Li, Jianwei Yu, Jiajun Deng, Xunying Liu, Helen Meng, 
Audio-Visual Multi-Channel Speech Separation, Dereverberation and Recognition.
ICASSP2022
Hang Su, Danyang Zhao, Long Dang, Minglei Li, Xixin Wu, Xunying Liu, Helen Meng, 
A Multitask Learning Framework for Speaker Change Detection with Content Information from Unsupervised Speech Decomposition.
ICASSP2022
Disong Wang, Songxiang Liu, Xixin Wu, Hui Lu, Lifa Sun, Xunying Liu, Helen Meng, 
Speaker Identity Preservation in Dysarthric Speech Reconstruction by Adversarial Speaker Adaptation.
ICASSP2022
Haibin Wu, Po-Chun Hsu, Ji Gao, Shanshan Zhang, Shen Huang, Jian Kang 0006, Zhiyong Wu 0001, Helen Meng, Hung-Yi Lee, 
Adversarial Sample Detection for Speaker Verification by Neural Vocoders.
ICASSP2022
Xixin Wu, Shoukang Hu, Zhiyong Wu 0001, Xunying Liu, Helen Meng, 
Neural Architecture Search for Speech Emotion Recognition.
ICASSP2022
Haibin Wu, Heng-Cheng Kuo, Naijun Zheng, Kuo-Hsuan Hung, Hung-Yi Lee, Yu Tsao 0001, Hsin-Min Wang, Helen Meng, 
Partially Fake Audio Detection by Self-Attention-Based Fake Span Discovery.
ICASSP2022
Junhao Xu, Jianwei Yu, Xunying Liu, Helen Meng, 
Mixed Precision DNN Quantization for Overlapped Speech Separation and Recognition.
ICASSP2022
Xintao Zhao, Feng Liu, Changhe Song, Zhiyong Wu 0001, Shiyin Kang, Deyi Tuo, Helen Meng, 
Disentangling Content and Fine-Grained Prosody Information Via Hybrid ASR Bottleneck Features for Voice Conversion.
ICASSP2022
Naijun Zheng, Na Li 0012, Xixin Wu, Lingwei Meng, Jiawen Kang, Haibin Wu, Chao Weng, Dan Su 0002, Helen Meng, 
The CUHK-Tencent Speaker Diarization System for the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Challenge.
ICASSP2022
Naijun Zheng, Na Li 0012, Jianwei Yu, Chao Weng, Dan Su 0002, Xunying Liu, Helen Meng, 
Multi-Channel Speaker Diarization Using Spatial Features for Meetings.
Interspeech2022
Jun Chen, Wei Rao, Zilin Wang, Zhiyong Wu 0001, Yannan Wang, Tao Yu, Shidong Shang, Helen Meng, 
Speech Enhancement with Fullband-Subband Cross-Attention Network.
TASLP2022
Cheng Lu 0005, Yuan Zong, Wenming Zheng, Yang Li 0019, Chuangao Tang, Björn W. Schuller, 
Domain Invariant Feature Learning for Speaker-Independent Speech Emotion Recognition.
ICASSP2022
Kun Qian 0003, Tanja Schultz, Björn W. Schuller, 
An Overview of the FIRST ICASSP Special Session on Computer Audition for Healthcare.
Interspeech2022
Zijiang Yang 0007, Xin Jing, Andreas Triantafyllopoulos, Meishu Song, Ilhan Aslan, Björn W. Schuller, 
An Overview & Analysis of Sequence-to-Sequence Emotional Voice Conversion.
Interspeech2022
Rui Liu 0008, Berrak Sisman, Björn W. Schuller, Guanglai Gao, Haizhou Li 0001, 
Accurate Emotion Strength Assessment for Seen and Unseen Speech Based on Data-Driven Deep Learning.
Interspeech2022
Yi Chang, Zhao Ren, Thanh Tam Nguyen, Wolfgang Nejdl, Björn W. Schuller, 
Example-based Explanations with Adversarial Attacks for Respiratory Sound Analysis.
Interspeech2022
Jiaming Cheng, Ruiyu Liang, Yue Xie, Li Zhao 0003, Björn W. Schuller, Jie Jia, Yiyuan Peng, 
Cross-Layer Similarity Knowledge Distillation for Speech Enhancement.
Interspeech2022
Adria Mallol-Ragolta, Helena Cuesta, Emilia Gómez, Björn W. Schuller, 
Multi-Type Outer Product-Based Fusion of Respiratory Sounds for Detecting COVID-19.
Interspeech2022
Rodrigo Schoburg Carrillo de Mira, Alexandros Haliassos, Stavros Petridis, Björn W. Schuller, Maja Pantic, 
SVTS: Scalable Video-to-Speech Synthesis.
Interspeech2022
Andreas Triantafyllopoulos, Johannes Wagner 0001, Hagen Wierstorf, Maximilian Schmitt, Uwe Reichel, Florian Eyben, Felix Burkhardt, Björn W. Schuller, 
Probing speech emotion recognition transformers for linguistic knowledge.
Interspeech2022
Andreas Triantafyllopoulos, Markus Fendler, Anton Batliner, Maurice Gerczuk, Shahin Amiriparian, Thomas M. Berghaus, Björn W. Schuller, 
Distinguishing between pre- and post-treatment in the speech of patients with chronic obstructive pulmonary disease.
Interspeech2022
Dominika Woszczyk, Anna Hlédiková, Alican Akman, Soteris Demetriou, Björn W. Schuller, 
Data Augmentation for Dementia Detection in Spoken Language.
TASLP2021
Jiaming Cheng, Ruiyu Liang, Zhenlin Liang, Li Zhao 0003, Chengwei Huang, Björn W. Schuller, 
A Deep Adaptation Network for Speech Enhancement: Combining a Relativistic Discriminator With Multi-Kernel Maximum Mean Discrepancy.
TASLP2021
Kazi Nazmul Haque, Rajib Rana, Jiajun Liu, John H. L. Hansen, Nicholas Cummins, Carlos Busso, Björn W. Schuller, 
Guided Generative Adversarial Neural Network for Representation Learning and Audio Generation Using Fewer Labelled Audio Data.
TASLP2021
N. P. Narendra, Björn W. Schuller, Paavo Alku, 
The Detection of Parkinson's Disease From Speech Using Voice Source Information.
ICASSP2021
Chao Li, Boyang Chen, Ziping Zhao 0001, Nicholas Cummins, Björn W. Schuller, 
Hierarchical Attention-Based Temporal Convolutional Networks for Eeg-Based Emotion Recognition.
ICASSP2021
Srividya Tirunellai Rajamani, Kumar T. Rajamani, Adria Mallol-Ragolta, Shuo Liu, Björn W. Schuller, 
A Novel Attention-Based Gated Recurrent Unit and its Efficacy in Speech Emotion Recognition.
ICASSP2021
Andreas Triantafyllopoulos, Björn W. Schuller, 
The Role of Task and Acoustic Similarity in Audio Transfer Learning: Insights from the Speech Emotion Recognition Case.
ICASSP2021
Panagiotis Tzirakis, Anh Nguyen 0003, Stefanos Zafeiriou, Björn W. Schuller, 
Speech Emotion Recognition Using Semantic Information.
Interspeech2021
Pingchuan Ma 0001, Rodrigo Mira, Stavros Petridis, Björn W. Schuller, Maja Pantic, 
LiRA: Learning Visual Speech Representations from Audio Through Self-Supervision.
Interspeech2021
Alice Baird, Silvan Mertes, Manuel Milling, Lukas Stappen, Thomas Wiest, Elisabeth André, Björn W. Schuller, 
A Prototypical Network Approach for Evaluating Generated Emotional Speech.
ICASSP2022
Tiantian Feng, Hanieh Hashemi, Murali Annavaram, Shrikanth S. Narayanan, 
Enhancing Privacy Through Domain Adaptive Noise Injection For Speech Emotion Recognition.
Interspeech2022
Tiantian Feng, Shrikanth Narayanan, 
Semi-FedSER: Semi-supervised Learning for Speech Emotion Recognition On Federated Learning using Multiview Pseudo-Labeling.
Interspeech2022
Tiantian Feng, Raghuveer Peri, Shrikanth Narayanan, 
User-Level Differential Privacy against Attribute Inference Attack of Speech Emotion Recognition on Federated Learning.
Interspeech2022
Nikolaos Flemotomos, Shrikanth Narayanan, 
Multimodal Clustering with Role Induced Constraints for Speaker Diarization.
ICASSP2021
Amr Gaballah, Abhishek Tiwari, Shrikanth Narayanan, Tiago H. Falk, 
Context-Aware Speech Stress Detection in Hospital Workers Using Bi-LSTM Classifiers.
ICASSP2021
Tae Jin Park, Manoj Kumar 0007, Shrikanth Narayanan, 
Multi-Scale Speaker Diarization with Neural Affinity Score Fusion.
Interspeech2021
Young-Kyung Kim, Rimita Lahiri, Md. Nasir, So Hyun Kim, Somer Bishop, Catherine Lord, Shrikanth S. Narayanan, 
Analyzing Short Term Dynamic Speech Features for Understanding Behavioral Traits of Children with Autism Spectrum Disorder.
Interspeech2021
Haoqi Li, Yelin Kim, Cheng-Hao Kuo, Shrikanth S. Narayanan, 
Acted vs. Improvised: Domain Adaptation for Elicitation Approaches in Audio-Visual Emotion Recognition.
Interspeech2021
Miran Oh, Dani Byrd, Shrikanth S. Narayanan, 
Leveraging Real-Time MRI for Illuminating Linguistic Velum Action.
ICASSP2020
Victor Ardulov, Zane Durante, Shanna Williams, Thomas D. Lyon, Shrikanth Narayanan, 
Identifying Truthful Language in Child Interviews.
ICASSP2020
Sandeep Nallan Chakravarthula, Md. Nasir, Shao-Yen Tseng, Haoqi Li, Tae Jin Park, Brian R. Baucom, Craig J. Bryan, Shrikanth Narayanan, Panayiotis G. Georgiou, 
Automatic Prediction of Suicidal Risk in Military Couples Using Multimodal Interaction Cues from Couples Conversations.
ICASSP2020
Tiantian Feng, Shrikanth S. Narayanan, 
Modeling Behavioral Consistency in Large-Scale Wearable Recordings of Human Bio-Behavioral Signals.
ICASSP2020
S. Ashwin Hebbar, Rahul Sharma, Krishna Somandepalli, Asterios Toutios, Shrikanth Narayanan, 
Vocal Tract Articulatory Contour Detection in Real-Time Magnetic Resonance Images Using Spatio-Temporal Context.
ICASSP2020
Rimita Lahiri, Manoj Kumar 0007, Somer Bishop, Shrikanth Narayanan, 
Learning Domain Invariant Representations for Child-Adult Classification from Speech.
ICASSP2020
Haoqi Li, Ming Tu, Jing Huang 0019, Shrikanth Narayanan, Panayiotis G. Georgiou, 
Speaker-Invariant Affective Representation Learning via Adversarial Training.
ICASSP2020
Monisankha Pal, Manoj Kumar 0007, Raghuveer Peri, Tae Jin Park, So Hyun Kim, Catherine Lord, Somer Bishop, Shrikanth Narayanan, 
Speaker Diarization Using Latent Space Clustering in Generative Adversarial Network.
ICASSP2020
Karan Singla, Shrikanth Narayanan, 
Multitask Learning for Darpa Lorelei's Situation Frame Extraction Task.
ICASSP2020
Jiaxi Wang, Karel Mundnich, Allison T. Knoll, Pat Levitt, Shrikanth Narayanan, 
Bringing in the Outliers: A Sparse Subspace Clustering Approach to Learn a Dictionary of Mouse Ultrasonic Vocalizations.
Interspeech2020
Pavlos Papadopoulos, Shrikanth Narayanan, 
Exploiting Conic Affinity Measures to Design Speech Enhancement Systems Operating in Unseen Noise Conditions.
Interspeech2020
Xiaoyi Qin, Ming Li 0026, Hui Bu, Wei Rao, Rohan Kumar Das, Shrikanth Narayanan, Haizhou Li 0001, 
The INTERSPEECH 2020 Far-Field Speaker Verification Challenge.
TASLP2022
Ashutosh Pandey 0004, DeLiang Wang, 
Self-Attending RNN for Speech Enhancement to Improve Cross-Corpus Generalization.
TASLP2022
Hassan Taherian, Ke Tan 0001, DeLiang Wang, 
Multi-Channel Talker-Independent Speaker Separation Through Location-Based Training.
TASLP2022
Heming Wang, DeLiang Wang, 
Neural Cascade Architecture With Triple-Domain Loss for Speech Enhancement.
TASLP2022
Heming Wang, Xueliang Zhang 0001, DeLiang Wang, 
Fusing Bone-Conduction and Air-Conduction Sensors for Complex-Domain Speech Enhancement.
TASLP2022
Hao Zhang, DeLiang Wang, 
Neural Cascade Architecture for Multi-Channel Acoustic Echo Suppression.
ICASSP2022
Ashutosh Pandey 0004, Buye Xu, Anurag Kumar 0003, Jacob Donley, Paul Calamia, DeLiang Wang, 
TPARN: Triple-Path Attentive Recurrent Network for Time-Domain Multichannel Speech Enhancement.
ICASSP2022
Hassan Taherian, Ke Tan 0001, DeLiang Wang, 
Location-Based Training for Multi-Channel Talker-Independent Speaker Separation.
ICASSP2022
Heming Wang, Yao Qian, Xiaofei Wang 0009, Yiming Wang, Chengyi Wang 0002, Shujie Liu 0001, Takuya Yoshioka, Jinyu Li 0001, DeLiang Wang, 
Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction.
ICASSP2022
Zhong-Qiu Wang, DeLiang Wang, 
Localization based Sequential Grouping for Continuous Speech Separation.
ICASSP2022
Heming Wang, DeLiang Wang, 
Cross-Domain Speech Enhancement with a Neural Cascade Architecture.
ICASSP2022
Heming Wang, Xueliang Zhang 0001, DeLiang Wang, 
Attention-Based Fusion for Bone-Conducted and Air-Conducted Speech Enhancement in the Complex Domain.
ICASSP2022
Fan Yu, Shiliang Zhang, Pengcheng Guo, Yihui Fu, Zhihao Du, Siqi Zheng, Weilong Huang, Lei Xie 0001, Zheng-Hua Tan, DeLiang Wang, Yanmin Qian, Kong Aik Lee, Zhijie Yan, Bin Ma 0001, Xin Xu, Hui Bu, 
Summary on the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge.
ICASSP2022
Hao Zhang, DeLiang Wang, 
Neural Cascade Architecture for Joint Acoustic Echo and Noise Suppression.
Interspeech2022
Ashutosh Pandey 0004, DeLiang Wang, 
Attentive Training: A New Training Framework for Talker-independent Speaker Extraction.
Interspeech2022
Ashutosh Pandey 0004, Buye Xu, Anurag Kumar 0003, Jacob Donley, Paul Calamia, DeLiang Wang, 
Time-domain Ad-hoc Array Speech Enhancement Using a Triple-path Network.
Interspeech2022
Haohe Liu, Woosung Choi, Xubo Liu, Qiuqiang Kong, Qiao Tian, DeLiang Wang, 
Neural Vocoder is All You Need for Speech Super-resolution.
Interspeech2022
Haohe Liu, Xubo Liu, Qiuqiang Kong, Qiao Tian, Yan Zhao 0010, DeLiang Wang, Chuanzeng Huang, Yuxuan Wang 0002, 
VoiceFixer: A Unified Framework for High-Fidelity Speech Restoration.
Interspeech2022
Hao Zhang, Ashutosh Pandey 0004, DeLiang Wang, 
Attentive Recurrent Network for Low-Latency Active Noise Control.
Interspeech2022
Yixuan Zhang, Heming Wang, DeLiang Wang, 
Densely-connected Convolutional Recurrent Network for Fundamental Frequency Estimation in Noisy Speech.
TASLP2021
Hao Li 0046, DeLiang Wang, Xueliang Zhang 0001, Guanglai Gao, 
Recurrent Neural Networks and Acoustic Features for Frame-Level Signal-to-Noise Ratio Estimation.
ICASSP2022
Jiachen Lian, Chunlei Zhang, Dong Yu 0001, 
Robust Disentangled Variational Speech Representation Learning for Zero-Shot Voice Conversion.
ICASSP2022
Songxiang Liu, Shan Yang, Dan Su 0002, Dong Yu 0001, 
Referee: Towards Reference-Free Cross-Speaker Style Transfer with Low-Quality Data for Expressive Speech Synthesis.
ICASSP2022
Dongpeng Ma, Yiwen Wang, Liqiang He, Mingjie Jin, Dan Su 0002, Dong Yu 0001, 
DP-DWA: Dual-Path Dynamic Weight Attention Network With Streaming Dfsmn-San For Automatic Speech Recognition.
ICASSP2022
Anton Ratnarajah, Shi-Xiong Zhang, Meng Yu 0003, Zhenyu Tang 0001, Dinesh Manocha, Dong Yu 0001, 
Fast-Rir: Fast Neural Diffuse Room Impulse Response Generator.
ICASSP2022
Brian Yan, Chunlei Zhang, Meng Yu 0003, Shi-Xiong Zhang, Siddharth Dalmia, Dan Berrebbi, Chao Weng, Shinji Watanabe 0001, Dong Yu 0001, 
Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization.
ICASSP2022
Zhao You, Shulin Feng, Dan Su 0002, Dong Yu 0001, 
Speechmoe2: Mixture-of-Experts Model with Improved Routing.
ICASSP2022
Chunlei Zhang, Jiatong Shi, Chao Weng, Meng Yu 0003, Dong Yu 0001, 
Towards end-to-end Speaker Diarization with Generalized Neural Speaker Clustering.
Interspeech2022
Ziqian Dai, Jianwei Yu, Yan Wang, Nuo Chen, Yanyao Bian, Guangzhi Li, Deng Cai 0002, Dong Yu 0001, 
Automatic Prosody Annotation with Pre-Trained Text-Speech Model.
Interspeech2022
Vinay Kothapally, Yong Xu 0004, Meng Yu 0003, Shi-Xiong Zhang, Dong Yu 0001, 
Joint Neural AEC and Beamforming with Double-Talk Detection.
Interspeech2022
Jiachen Lian, Chunlei Zhang, Gopala Krishna Anumanchipalli, Dong Yu 0001, 
Towards Improved Zero-shot Voice Conversion with Conditional DSVAE.
Interspeech2022
Jinchuan Tian, Jianwei Yu, Chunlei Zhang, Yuexian Zou, Dong Yu 0001, 
LAE: Language-Aware Encoder for Monolingual and Multilingual ASR.
ICLR2022
Max W. Y. Lam, Jun Wang 0091, Dan Su 0002, Dong Yu 0001, 
BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis.
IJCAI2022
Rongjie Huang, Max W. Y. Lam, Jun Wang 0091, Dan Su 0002, Dong Yu 0001, Yi Ren 0006, Zhou Zhao, 
FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis.
NAACL2022
Dian Yu 0001, Ben Zhou, Dong Yu 0001, 
End-to-End Chinese Speaker Identification.
TASLP2021
Daniel Michelsanti, Zheng-Hua Tan, Shi-Xiong Zhang, Yong Xu 0004, Meng Yu 0003, Dong Yu 0001, Jesper Jensen 0001, 
An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation.
TASLP2021
Kun Xu 0005, Han Wu 0004, Linfeng Song, Haisong Zhang, Linqi Song, Dong Yu 0001, 
Conversational Semantic Role Labeling.
TASLP2021
Jianwei Yu, Shi-Xiong Zhang, Bo Wu, Shansong Liu, Shoukang Hu, Mengzhe Geng, Xunying Liu, Helen Meng, Dong Yu 0001, 
Audio-Visual Multi-Channel Integration and Recognition of Overlapped Speech.
TASLP2021
Zhuohuang Zhang, Yong Xu 0004, Meng Yu 0003, Shi-Xiong Zhang, Lianwu Chen, Donald S. Williamson, Dong Yu 0001, 
Multi-Channel Multi-Frame ADL-MVDR for Target Speech Separation.
ICASSP2021
Liqiang He, Dan Su 0002, Dong Yu 0001, 
Learned Transferable Architectures Can Surpass Hand-Designed Architectures for Large Scale Speech Recognition.
ICASSP2021
Max W. Y. Lam, Jun Wang 0091, Dan Su 0002, Dong Yu 0001, 
Sandglasset: A Light Multi-Granularity Self-Attentive Network for Time-Domain Speech Separation.
TASLP2022
Chenda Li, Zhuo Chen 0006, Yanmin Qian, 
Dual-Path Modeling With Memory Embedding Model for Continuous Speech Separation.
TASLP2022
Yanmin Qian, Xun Gong 0005, Houjun Huang, 
Layer-Wise Fast Adaptation for End-to-End Multi-Accent Speech Recognition.
TASLP2022
Yanmin Qian, Zhikai Zhou, 
Optimizing Data Usage for Low-Resource Speech Recognition.
ICASSP2022
Zhengyang Chen, Sanyuan Chen, Yu Wu 0012, Yao Qian, Chengyi Wang 0002, Shujie Liu 0001, Yanmin Qian, Michael Zeng 0001, 
Large-Scale Self-Supervised Speech Representation Learning for Automatic Speaker Verification.
ICASSP2022
Bing Han, Zhengyang Chen, Bei Liu, Yanmin Qian, 
MLP-SVNET: A Multi-Layer Perceptrons Based Network for Speaker Verification.
ICASSP2022
Bing Han, Zhengyang Chen, Yanmin Qian, 
Local Information Modeling with Self-Attention for Speaker Verification.
ICASSP2022
Chenda Li, Lei Yang, Weiqin Wang, Yanmin Qian, 
Skim: Skipping Memory Lstm for Low-Latency Real-Time Continuous Speech Separation.
ICASSP2022
Wei Wang, Xun Gong 0005, Yifei Wu, Zhikai Zhou, Chenda Li, Wangyou Zhang, Bing Han, Yanmin Qian, 
The Sjtu System For Multimodal Information Based Speech Processing Challenge 2021.
ICASSP2022
Yifei Wu, Chenda Li, Jinfeng Bai, Zhongqin Wu, Yanmin Qian, 
Time-Domain Audio-Visual Speech Separation on Low Quality Videos.
ICASSP2022
Fan Yu, Shiliang Zhang, Pengcheng Guo, Yihui Fu, Zhihao Du, Siqi Zheng, Weilong Huang, Lei Xie 0001, Zheng-Hua Tan, DeLiang Wang, Yanmin Qian, Kong Aik Lee, Zhijie Yan, Bin Ma 0001, Xin Xu, Hui Bu, 
Summary on the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge.
ICASSP2022
Zhikai Zhou, Tian Tan 0002, Yanmin Qian, 
Punctuation Prediction for Streaming On-Device Speech Recognition.
Interspeech2022
Xun Gong 0005, Zhikai Zhou, Yanmin Qian, 
Knowledge Transfer and Distillation from Autoregressive to Non-Autoregessive Speech Recognition.
Interspeech2022
Bing Han, Zhengyang Chen, Yanmin Qian, 
Self-Supervised Speaker Verification Using Dynamic Loss-Gate and Label Correction.
Interspeech2022
Tao Liu, Shuai Fan 0005, Xu Xiang, Hongbo Song, Shaoxiong Lin, Jiaqi Sun, Tianyuan Han, Siyuan Chen, Binwei Yao, Sen Liu, Yifei Wu, Yanmin Qian, Kai Yu 0004, 
MSDWild: Multi-modal Speaker Diarization Dataset in the Wild.
Interspeech2022
Bei Liu, Zhengyang Chen, Yanmin Qian, 
Attentive Feature Fusion for Robust Speaker Verification.
Interspeech2022
Bei Liu, Zhengyang Chen, Yanmin Qian, 
Dual Path Embedding Learning for Speaker Verification with Triplet Attention.
Interspeech2022
Bei Liu, Zhengyang Chen, Shuai Wang 0016, Haoyu Wang, Bing Han, Yanmin Qian, 
DF-ResNet: Boosting Speaker Verification Performance with Depth-First Design.
Interspeech2022
Yen-Ju Lu, Xuankai Chang, Chenda Li, Wangyou Zhang, Samuele Cornell, Zhaoheng Ni, Yoshiki Masuyama, Brian Yan, Robin Scheibler, Zhong-Qiu Wang, Yu Tsao 0001, Yanmin Qian, Shinji Watanabe 0001, 
ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding.
Interspeech2022
Wangyou Zhang, Zhuo Chen 0006, Naoyuki Kanda, Shujie Liu 0001, Jinyu Li 0001, Sefik Emre Eskimez, Takuya Yoshioka, Xiong Xiao, Zhong Meng, Yanmin Qian, Furu Wei, 
Separating Long-Form Speech with Group-wise Permutation Invariant Training.
Interspeech2022
Leying Zhang, Zhengyang Chen, Yanmin Qian, 
Enroll-Aware Attentive Statistics Pooling for Target Speaker Verification.
SpeechComm2022
Rasa Lileikyte, Dwight Irvin, John H. L. Hansen, 
Assessing child communication engagement and statistical speech patterns for American English via speech recognition in naturalistic active learning spaces.
TASLP2022
Vinay Kothapally, John H. L. Hansen, 
SkipConvGAN: Monaural Speech Dereverberation Using Generative Adversarial Networks via Complex Time-Frequency Masking.
TASLP2022
Zhenyu Wang, John H. L. Hansen, 
Multi-Source Domain Adaptation for Text-Independent Forensic Speaker Recognition.
Interspeech2022
Chelzy Belitz, John H. L. Hansen, 
Challenges in Metadata Creation for Massive Naturalistic Team-Based Audio Data.
Interspeech2022
Avamarie Brueggeman, John H. L. Hansen, 
Speaker Trait Enhancement for Cochlear Implant Users: A Case Study for Speaker Emotion Perception.
Interspeech2022
Szu-Jui Chen, Jiamin Xie, John H. L. Hansen, 
FeaRLESS: Feature Refinement Loss for Ensembling Self-Supervised Learning Features in Robust End-to-end Speech Recognition.
Interspeech2022
Satwik Dutta, Sarah Anne Tao, Jacob C. Reyna, Rebecca Elizabeth Hacker, Dwight W. Irvin, Jay F. Buzhardt, John H. L. Hansen, 
Challenges remain in Building ASR for Spontaneous Preschool Children Speech in Naturalistic Educational Environments.
Interspeech2022
John H. L. Hansen, Zhenyu Wang, 
Audio Anti-spoofing Using Simple Attention Module and Joint Optimization Based on Additive Angular Margin Loss and Meta-learning.
Interspeech2022
Vinay Kothapally, John H. L. Hansen, 
Complex-Valued Time-Frequency Self-Attention for Speech Dereverberation.
Interspeech2022
Juliana N. Saba, John H. L. Hansen, 
Speech Modification for Intelligibility in Cochlear Implant Listeners: Individual Effects of Vowel- and Consonant-Boosting.
Interspeech2022
Mufan Sang, John H. L. Hansen, 
Multi-Frequency Information Enhanced Channel Attention Module for Speaker Representation Learning.
Interspeech2022
Jiamin Xie, John H. L. Hansen, 
DEFORMER: Coupling Deformed Localized Patterns with Global Context for Robust End-to-end Speech Recognition.
Interspeech2022
Mu Yang, Kevin Hirschi, Stephen Daniel Looney, Okim Kang, John H. L. Hansen, 
Improving Mispronunciation Detection with Wav2vec2-based Momentum Pseudo-Labeling for Accentedness and Intelligibility Assessment.
SpeechComm2021
Fahimeh Bahmaninezhad, Chunlei Zhang, John H. L. Hansen, 
An investigation of domain adaptation in speaker embedding space for speaker recognition.
SpeechComm2021
Shivesh Ranjan, John H. L. Hansen, 
Curriculum Learning based approaches for robust end-to-end far-field speech recognition.
SpeechComm2021
John H. L. Hansen, Allen R. Stauffer, Wei Xia, 
Nonlinear waveform distortion: Assessment and detection of clipping on speech data and systems.
TASLP2021
Kazi Nazmul Haque, Rajib Rana, Jiajun Liu, John H. L. Hansen, Nicholas Cummins, Carlos Busso, Björn W. Schuller, 
Guided Generative Adversarial Neural Network for Representation Learning and Audio Generation Using Fewer Labelled Audio Data.
TASLP2021
Finnian Kelly, John H. L. Hansen, 
Analysis and Calibration of Lombard Effect and Whisper for Speaker Recognition.
TASLP2021
Midia Yousefi, John H. L. Hansen, 
Block-Based High Performance CNN Architectures for Frame-Level Overlapping Speech Detection.
ICASSP2021
Mufan Sang, Wei Xia, John H. L. Hansen, 
DEAAN: Disentangled Embedding and Adversarial Adaptation Network for Robust Speaker Representation Learning.
TASLP2022
Anssi Kanervisto, Ville Hautamäki, Tomi Kinnunen, Junichi Yamagishi, 
Optimizing Tandem Speaker Verification and Anti-Spoofing Systems.
TASLP2022
Xuan Shi, Erica Cooper, Junichi Yamagishi, 
Use of Speaker Recognition Approaches for Learning and Evaluating Embedding Representations of Musical Instrument Sounds.
TASLP2022
Brij Mohan Lal Srivastava, Mohamed Maouche, Md. Sahidullah, Emmanuel Vincent 0001, Aurélien Bellet, Marc Tommasi, Natalia A. Tomashenko, Xin Wang 0037, Junichi Yamagishi, 
Privacy and Utility of X-Vector Based Speaker Anonymization.
ICASSP2022
Erica Cooper, Wen-Chin Huang, Tomoki Toda, Junichi Yamagishi, 
Generalization Ability of MOS Prediction Networks.
ICASSP2022
Wen-Chin Huang, Erica Cooper, Junichi Yamagishi, Tomoki Toda, 
LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech.
ICASSP2022
Cheng-I Jeff Lai, Erica Cooper, Yang Zhang 0001, Shiyu Chang, Kaizhi Qian, Yi-Lun Liao, Yung-Sung Chuang, Alexander H. Liu, Junichi Yamagishi, David D. Cox, James R. Glass, 
On the Interplay between Sparsity, Naturalness, Intelligibility, and Prosody in Speech Synthesis.
ICASSP2022
Xin Wang 0037, Junichi Yamagishi, 
Estimating the Confidence of Speech Spoofing Countermeasure.
ICASSP2022
Chang Zeng, Xin Wang 0037, Erica Cooper, Xiaoxiao Miao, Junichi Yamagishi, 
Attention Back-End for Automatic Speaker Verification with Multiple Enrollment Utterances.
Interspeech2022
Wen-Chin Huang, Erica Cooper, Yu Tsao 0001, Hsin-Min Wang, Tomoki Toda, Junichi Yamagishi, 
The VoiceMOS Challenge 2022.
Interspeech2022
Haoyu Li, Junichi Yamagishi, 
DDS: A new device-degraded speech dataset for speech enhancement.
Interspeech2022
Xiaoxiao Miao, Xin Wang 0037, Erica Cooper, Junichi Yamagishi, Natalia A. Tomashenko, 
Analyzing Language-Independent Speaker Anonymization Framework under Unseen Conditions.
Interspeech2022
Chang Zeng, Lin Zhang, Meng Liu, Junichi Yamagishi, 
Spoofing-Aware Attention based ASV Back-end with Multiple Enrollment Utterances and a Sampling Strategy for the SASV Challenge 2022.
TASLP2021
Haoyu Li, Junichi Yamagishi, 
Multi-Metric Optimization Using Generative Adversarial Networks for Near-End Speech Intelligibility Enhancement.
ICASSP2021
Shuhei Kato, Yusuke Yasuda, Xin Wang 0037, Erica Cooper, Junichi Yamagishi, 
How Similar or Different is Rakugo Speech Synthesizer to Professional Performers?
ICASSP2021
Jennifer Williams 0001, Yi Zhao 0006, Erica Cooper, Junichi Yamagishi, 
Learning Disentangled Phone and Speaker Representations in a Semi-Supervised VQ-VAE Paradigm.
ICASSP2021
Yusuke Yasuda, Xin Wang 0037, Junichi Yamagishi, 
End-to-End Text-to-Speech Using Latent Duration Based on VQ-VAE.
Interspeech2021
Tomi Kinnunen, Andreas Nautsch, Md. Sahidullah, Nicholas W. D. Evans, Xin Wang 0037, Massimiliano Todisco, Héctor Delgado, Junichi Yamagishi, Kong Aik Lee, 
Visualizing Classifier Adjacency Relations: A Case Study in Speaker Verification and Voice Anti-Spoofing.
Interspeech2021
Xin Wang 0037, Junichi Yamagishi, 
A Comparative Study on Recent Neural Spoofing Countermeasures for Synthetic Speech Detection.
Interspeech2021
Lin Zhang, Xin Wang 0037, Erica Cooper, Junichi Yamagishi, Jose Patino 0001, Nicholas W. D. Evans, 
An Initial Investigation for Detecting Partially Spoofed Audio.
TASLP2020
Tomi Kinnunen, Héctor Delgado, Nicholas W. D. Evans, Kong Aik Lee, Ville Vestman, Andreas Nautsch, Massimiliano Todisco, Xin Wang 0037, Md. Sahidullah, Junichi Yamagishi, Douglas A. Reynolds, 
Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification: Fundamentals.
TASLP2022
Mengzhe Geng, Xurong Xie, Zi Ye, Tianzi Wang, Guinan Li, Shujie Hu, Xunying Liu, Helen Meng, 
Speaker Adaptation Using Spectro-Temporal Deep Features for Dysarthric and Elderly Speech Recognition.
TASLP2022
Shoukang Hu, Xurong Xie, Mingyu Cui, Jiajun Deng, Shansong Liu, Jianwei Yu, Mengzhe Geng, Xunying Liu, Helen Meng, 
Neural Architecture Search for LF-MMI Trained Time Delay Neural Networks.
TASLP2022
Boyang Xue, Shoukang Hu, Junhao Xu, Mengzhe Geng, Xunying Liu, Helen Meng, 
Bayesian Neural Network Language Modeling for Speech Recognition.
ICASSP2022
Guinan Li, Jianwei Yu, Jiajun Deng, Xunying Liu, Helen Meng, 
Audio-Visual Multi-Channel Speech Separation, Dereverberation and Recognition.
ICASSP2022
Hang Su, Danyang Zhao, Long Dang, Minglei Li, Xixin Wu, Xunying Liu, Helen Meng, 
A Multitask Learning Framework for Speaker Change Detection with Content Information from Unsupervised Speech Decomposition.
ICASSP2022
Disong Wang, Songxiang Liu, Xixin Wu, Hui Lu, Lifa Sun, Xunying Liu, Helen Meng, 
Speaker Identity Preservation in Dysarthric Speech Reconstruction by Adversarial Speaker Adaptation.
ICASSP2022
Xixin Wu, Shoukang Hu, Zhiyong Wu 0001, Xunying Liu, Helen Meng, 
Neural Architecture Search for Speech Emotion Recognition.
ICASSP2022
Junhao Xu, Jianwei Yu, Xunying Liu, Helen Meng, 
Mixed Precision DNN Quantization for Overlapped Speech Separation and Recognition.
ICASSP2022
Naijun Zheng, Na Li 0012, Jianwei Yu, Chao Weng, Dan Su 0002, Xunying Liu, Helen Meng, 
Multi-Channel Speaker Diarization Using Spatial Features for Meetings.
Interspeech2022
Mingyu Cui, Jiajun Deng, Shoukang Hu, Xurong Xie, Tianzi Wang, Shujie Hu, Mengzhe Geng, Boyang Xue, Xunying Liu, Helen Meng, 
Two-pass Decoding and Cross-adaptation Based System Combination of End-to-end Conformer and Hybrid TDNN ASR Systems.
Interspeech2022
Jiajun Deng, Xurong Xie, Tianzi Wang, Mingyu Cui, Boyang Xue, Zengrui Jin, Mengzhe Geng, Guinan Li, Xunying Liu, Helen Meng, 
Confidence Score Based Conformer Speaker Adaptation for Speech Recognition.
Interspeech2022
Jinchao Li, Shuai Wang, Yang Chao, Xunying Liu, Helen Meng, 
Context-aware Multimodal Fusion for Emotion Recognition.
Interspeech2022
Tianzi Wang, Jiajun Deng, Mengzhe Geng, Zi Ye, Shoukang Hu, Yi Wang, Mingyu Cui, Zengrui Jin, Xunying Liu, Helen Meng, 
Conformer Based Elderly Speech Recognition System for Alzheimer's Disease Detection.
Interspeech2022
Yi Wang, Tianzi Wang, Zi Ye, Lingwei Meng, Shoukang Hu, Xixin Wu, Xunying Liu, Helen Meng, 
Exploring linguistic feature and model combination for speech recognition based automatic AD detection.
Interspeech2022
Junhao Xu, Shoukang Hu, Xunying Liu, Helen Meng, 
Towards Green ASR: Lossless 4-bit Quantization of a Hybrid TDNN System on the 300-hr Swithboard Corpus.
TASLP2021
Shoukang Hu, Xurong Xie, Shansong Liu, Jianwei Yu, Zi Ye, Mengzhe Geng, Xunying Liu, Helen Meng, 
Bayesian Learning of LF-MMI Trained Time Delay Neural Networks for Speech Recognition.
TASLP2021
Songxiang Liu, Yuewen Cao, Disong Wang, Xixin Wu, Xunying Liu, Helen Meng, 
Any-to-Many Voice Conversion With Location-Relative Sequence-to-Sequence Modeling.
TASLP2021
Shansong Liu, Mengzhe Geng, Shoukang Hu, Xurong Xie, Mingyu Cui, Jianwei Yu, Xunying Liu, Helen Meng, 
Recent Progress in the CUHK Dysarthric Speech Recognition System.
TASLP2021
Xixin Wu, Yuewen Cao, Hui Lu, Songxiang Liu, Shiyin Kang, Zhiyong Wu 0001, Xunying Liu, Helen Meng, 
Exemplar-Based Emotive Speech Synthesis.
TASLP2021
Xixin Wu, Yuewen Cao, Hui Lu, Songxiang Liu, Disong Wang, Zhiyong Wu 0001, Xunying Liu, Helen Meng, 
Speech Emotion Recognition Using Sequential Capsule Networks.
TASLP2022
Xiaochun An, Frank K. Soong, Lei Xie 0001, 
Disentangling Style and Speaker Attributes for TTS Style Transfer.
TASLP2022
Liumeng Xue, Frank K. Soong, Shaofei Zhang, Lei Xie 0001, 
ParaTTS: Learning Linguistic and Prosodic Cross-Sentence Information in Paragraph-Based TTS.
ICASSP2022
Yihui Fu, Yun Liu, Jingdong Li, Dawei Luo, Shubo Lv, Yukai Jv, Lei Xie 0001, 
Uformer: A Unet Based Dilated Complex & Real Dual-Path Conformer Network for Simultaneous Speech Enhancement and Dereverberation.
ICASSP2022
Kun Wei, Yike Zhang, Sining Sun, Lei Xie 0001, Long Ma, 
Conversational Speech Recognition by Learning Conversation-Level Characteristics.
ICASSP2022
Fan Yu, Shiliang Zhang, Pengcheng Guo, Yihui Fu, Zhihao Du, Siqi Zheng, Weilong Huang, Lei Xie 0001, Zheng-Hua Tan, DeLiang Wang, Yanmin Qian, Kong Aik Lee, Zhijie Yan, Bin Ma 0001, Xin Xu, Hui Bu, 
Summary on the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge.
ICASSP2022
Yongmao Zhang, Jian Cong, Heyang Xue, Lei Xie 0001, Pengcheng Zhu, Mengxiao Bi, 
VISinger: Variational Inference with Adversarial Learning for End-to-End Singing Voice Synthesis.
Interspeech2022
Yi Lei, Shan Yang, Jian Cong, Lei Xie 0001, Dan Su 0002, 
Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice Conversion.
Interspeech2022
Tao Li, Xinsheng Wang, Qicong Xie, Zhichao Wang, Mingqi Jiang, Lei Xie 0001, 
Cross-speaker Emotion Transfer Based On Prosody Compensation for End-to-End Speech Synthesis.
Interspeech2022
Qijie Shao, Jinghao Yan, Jian Kang 0006, Pengcheng Guo, Xian Shi, Pengfei Hu, Lei Xie 0001, 
Linguistic-Acoustic Similarity Based Accent Shift for Accent Recognition.
Interspeech2022
Yu Wang, Xinsheng Wang, Pengcheng Zhu, Jie Wu, Hanzhao Li, Heyang Xue, Yongmao Zhang, Lei Xie 0001, Mengxiao Bi, 
Opencpop: A High-Quality Open Source Chinese Popular Song Corpus for Singing Voice Synthesis.
Interspeech2022
Kun Wei, Yike Zhang, Sining Sun, Lei Xie 0001, Long Ma, 
Leveraging Acoustic Contextual Representation by Audio-textual Cross-modal Learning for Conversational ASR.
Interspeech2022
Heyang Xue, Xinsheng Wang, Yongmao Zhang, Lei Xie 0001, Pengcheng Zhu, Mengxiao Bi, 
Learn2Sing 2.0: Diffusion and Mutual Information-Based Target Speaker SVS by Learning from Singing Teacher.
Interspeech2022
Liumeng Xue, Shan Yang, Na Hu, Dan Su 0002, Lei Xie 0001, 
Learning Noise-independent Speech Representation for High-quality Voice Conversion for Noisy Target Speakers.
Interspeech2022
Zhanheng Yang, Hang Lv 0001, Xiong Wang, Ao Zhang, Lei Xie 0001, 
Minimizing Sequential Confusion Error in Speech Command Recognition.
Interspeech2022
Zhanheng Yang, Sining Sun, Jin Li, Xiaoming Zhang, Xiong Wang, Long Ma, Lei Xie 0001, 
CaTT-KWS: A Multi-stage Customized Keyword Spotting Framework based on Cascaded Transducer-Transformer.
Interspeech2022
Fan Yu, Zhihao Du, Shiliang Zhang, Yuxiao Lin, Lei Xie 0001, 
A Comparative Study on Speaker-attributed Automatic Speech Recognition in Multi-party Meetings.
Interspeech2022
Li Zhang 0084, Yue Li, Huan Zhao, Qing Wang 0039, Lei Xie 0001, 
Backend Ensemble for Speaker Verification and Spoofing Countermeasure.
Interspeech2022
Shimin Zhang, Ziteng Wang, Yukai Ju, Yihui Fu, Yueyue Na, Qiang Fu, Lei Xie 0001, 
Personalized Acoustic Echo Cancellation for Full-duplex Communications.
Interspeech2022
Binbin Zhang, Di Wu, Zhendong Peng, Xingchen Song, Zhuoyuan Yao, Hang Lv 0001, Lei Xie 0001, Chao Yang, Fuping Pan, Jianwei Niu 0002, 
WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit.
SpeechComm2021
Hongqiang Du, Xiaohai Tian, Lei Xie 0001, Haizhou Li 0001, 
Factorized WaveNet for voice conversion with limited data.
TASLP2022
Sung-Feng Huang, Chyi-Jiunn Lin, Da-Rong Liu, Yi-Chen Chen, Hung-yi Lee, 
Meta-TTS: Meta-Learning for Few-Shot Speaker Adaptive Text-to-Speech.
TASLP2022
Da-Rong Liu, Po-Chun Hsu, Yi-Chen Chen, Sung-Feng Huang, Shun-Po Chuang, Da-Yi Wu, Hung-yi Lee, 
Learning Phone Recognition From Unpaired Audio and Phone Sequences Based on Generative Adversarial Network.
TASLP2022
Haibin Wu, Xu Li, Andy T. Liu, Zhiyong Wu 0001, Helen Meng, Hung-Yi Lee, 
Improving the Adversarial Robustness for Speaker Verification by Self-Supervised Learning.
ICASSP2022
Heng-Jui Chang, Shu-Wen Yang, Hung-yi Lee, 
Distilhubert: Speech Representation Learning by Layer-Wise Distillation of Hidden-Unit Bert.
ICASSP2022
Chien-yu Huang, Kai-Wei Chang, Hung-Yi Lee, 
Toward Degradation-Robust Voice Conversion.
ICASSP2022
Wen-Chin Huang, Shu-Wen Yang, Tomoki Hayashi, Hung-Yi Lee, Shinji Watanabe 0001, Tomoki Toda, 
S3PRL-VC: Open-Source Voice Conversion Framework with Self-Supervised Speech Representations.
ICASSP2022
Yen Meng, Yi-Hui Chou, Andy T. Liu, Hung-yi Lee, 
Don't Speak Too Fast: The Impact of Data Bias on Self-Supervised Speech Models.
ICASSP2022
Haibin Wu, Po-Chun Hsu, Ji Gao, Shanshan Zhang, Shen Huang, Jian Kang 0006, Zhiyong Wu 0001, Helen Meng, Hung-Yi Lee, 
Adversarial Sample Detection for Speaker Verification by Neural Vocoders.
ICASSP2022
Haibin Wu, Heng-Cheng Kuo, Naijun Zheng, Kuo-Hsuan Hung, Hung-Yi Lee, Yu Tsao 0001, Hsin-Min Wang, Helen Meng, 
Partially Fake Audio Detection by Self-Attention-Based Fake Span Discovery.
Interspeech2022
Chih-Chiang Chang, Hung-yi Lee, 
Exploring Continuous Integrate-and-Fire for Adaptive Simultaneous Speech Translation.
Interspeech2022
Kai-Wei Chang, Wei-Cheng Tseng, Shang-Wen Li 0001, Hung-yi Lee, 
An Exploration of Prompt Tuning on Generative Spoken Language Model for Speech Processing Tasks.
Interspeech2022
Wei-Ping Huang, Po-Chun Chen, Sung-Feng Huang, Hung-yi Lee, 
Few Shot Cross-Lingual TTS Using Transferable Phoneme Embedding.
Interspeech2022
Kuan-Po Huang, Yu-Kuan Fu, Yu Zhang 0033, Hung-yi Lee, 
Improving Distortion Robustness of Self-supervised Speech Processing Tasks with Domain Adaptation.
Interspeech2022
Guan-Ting Lin, Shang-Wen Li 0001, Hung-yi Lee, 
Listen, Adapt, Better WER: Source-free Single-utterance Test-time Adaptation for Automatic Speech Recognition.
Interspeech2022
Guan-Ting Lin, Yung-Sung Chuang, Ho-Lam Chung, Shu-Wen Yang, Hsuan-Jui Chen, Shuyan Annie Dong, Shang-Wen Li 0001, Abdelrahman Mohamed, Hung-yi Lee, Lin-Shan Lee, 
DUAL: Discrete Spoken Unit Adaptive Learning for Textless Spoken Question Answering.
Interspeech2022
Wei-Cheng Tseng, Wei-Tsung Kao, Hung-yi Lee, 
DDOS: A MOS Prediction Framework utilizing Domain Adaptive Pre-training and Distribution of Opinion Scores.
Interspeech2022
Wei-Cheng Tseng, Wei-Tsung Kao, Hung-yi Lee, 
Membership Inference Attacks Against Self-supervised Speech Models.
Interspeech2022
Haibin Wu, Lingwei Meng, Jiawen Kang, Jinchao Li, Xu Li, Xixin Wu, Hung-yi Lee, Helen Meng, 
Spoofing-Aware Speaker Verification by Multi-Level Fusion.
Interspeech2022
Yang Zhang, Zhiqiang Lv, Haibin Wu, Shanshan Zhang, Pengfei Hu, Zhiyong Wu 0001, Hung-yi Lee, Helen Meng, 
MFA-Conformer: Multi-scale Feature Aggregation Conformer for Automatic Speaker Verification.
ACL2022
Hsiang-Sheng Tsai, Heng-Jui Chang, Wen-Chin Huang, Zili Huang, Kushal Lakhotia, Shu-Wen Yang, Shuyan Dong, Andy T. Liu, Cheng-I Lai, Jiatong Shi, Xuankai Chang, Phil Hall, Hsuan-Jui Chen, Shang-Wen Li 0001, Shinji Watanabe 0001, Abdelrahman Mohamed, Hung-yi Lee, 
SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities.
TASLP2022
Xiaoqiang Wang, Yanqing Liu, Jinyu Li 0001, Veljko Miljanic, Sheng Zhao, Hosam Khalil, 
Towards Contextual Spelling Correction for Customization of End-to-End Speech Recognition Systems.
ICASSP2022
Liang Lu 0001, Jinyu Li 0001, Yifan Gong 0001, 
Endpoint Detection for Streaming End-to-End Multi-Talker ASR.
ICASSP2022
Desh Raj, Liang Lu 0001, Zhuo Chen 0006, Yashesh Gaur, Jinyu Li 0001, 
Continuous Streaming Multi-Talker ASR with Dual-Path Transducers.
ICASSP2022
Yiming Wang, Jinyu Li 0001, Heming Wang, Yao Qian, Chengyi Wang 0002, Yu Wu 0012, 
Wav2vec-Switch: Contrastive Learning from Original-Noisy Speech Pairs for Robust Speech Recognition.
ICASSP2022
Heming Wang, Yao Qian, Xiaofei Wang 0009, Yiming Wang, Chengyi Wang 0002, Shujie Liu 0001, Takuya Yoshioka, Jinyu Li 0001, DeLiang Wang, 
Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction.
ICASSP2022
Chengyi Wang 0002, Yu Wu 0012, Sanyuan Chen, Shujie Liu 0001, Jinyu Li 0001, Yao Qian, Zhenglu Yang, 
Improving Self-Supervised Learning for Speech Recognition with Intermediate Layer Supervision.
ICASSP2022
Yixuan Zhang, Zhuo Chen 0006, Jian Wu 0027, Takuya Yoshioka, Peidong Wang, Zhong Meng, Jinyu Li 0001, 
Continuous Speech Separation with Recurrent Selective Attention Network.
ICASSP2022
Long Zhou, Jinyu Li 0001, Eric Sun, Shujie Liu 0001, 
A Configurable Multilingual Model is All You Need to Recognize All Languages.
Interspeech2022
Junyi Ao, Ziqiang Zhang, Long Zhou, Shujie Liu 0001, Haizhou Li 0001, Tom Ko, Lirong Dai 0001, Jinyu Li 0001, Yao Qian, Furu Wei, 
Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data.
Interspeech2022
Sanyuan Chen, Yu Wu 0012, Chengyi Wang 0002, Shujie Liu 0001, Zhuo Chen 0006, Peidong Wang, Gang Liu, Jinyu Li 0001, Jian Wu 0027, Xiangzhan Yu, Furu Wei, 
Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?
Interspeech2022
Naoyuki Kanda, Jian Wu 0027, Yu Wu 0012, Xiong Xiao, Zhong Meng, Xiaofei Wang 0009, Yashesh Gaur, Zhuo Chen 0006, Jinyu Li 0001, Takuya Yoshioka, 
Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings.
Interspeech2022
Naoyuki Kanda, Jian Wu 0027, Yu Wu 0012, Xiong Xiao, Zhong Meng, Xiaofei Wang 0009, Yashesh Gaur, Zhuo Chen 0006, Jinyu Li 0001, Takuya Yoshioka, 
Streaming Multi-Talker ASR with Token-Level Serialized Output Training.
Interspeech2022
Zhong Meng, Yashesh Gaur, Naoyuki Kanda, Jinyu Li 0001, Xie Chen 0001, Yu Wu 0012, Yifan Gong 0001, 
Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition.
Interspeech2022
Chengyi Wang 0002, Yiming Wang, Yu Wu 0012, Sanyuan Chen, Jinyu Li 0001, Shujie Liu 0001, Furu Wei, 
Supervision-Guided Codebooks for Masked Prediction in Speech Pre-training.
Interspeech2022
Jian Xue, Peidong Wang, Jinyu Li 0001, Matt Post, Yashesh Gaur, 
Large-Scale Streaming End-to-End Speech Translation with Neural Transducers.
Interspeech2022
Wangyou Zhang, Zhuo Chen 0006, Naoyuki Kanda, Shujie Liu 0001, Jinyu Li 0001, Sefik Emre Eskimez, Takuya Yoshioka, Xiong Xiao, Zhong Meng, Yanmin Qian, Furu Wei, 
Separating Long-Form Speech with Group-wise Permutation Invariant Training.
ACL2022
Junyi Ao, Rui Wang, Long Zhou, Chengyi Wang 0002, Shuo Ren, Yu Wu 0012, Shujie Liu 0001, Tom Ko, Qing Li, Yu Zhang 0006, Zhihua Wei, Yao Qian, Jinyu Li 0001, Furu Wei, 
SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing.
ICASSP2021
Sanyuan Chen, Yu Wu 0012, Zhuo Chen 0006, Takuya Yoshioka, Shujie Liu 0001, Jin-Yu Li 0001, Xiangzhan Yu, 
Don't Shoot Butterfly with Rifles: Multi-Channel Continuous Speech Separation with Early Exit Transformer.
ICASSP2021
Xie Chen 0001, Yu Wu 0012, Zhenghao Wang, Shujie Liu 0001, Jinyu Li 0001, 
Developing Real-Time Streaming Transformer Transducer for Speech Recognition on Large-Scale Dataset.
ICASSP2021
Sanyuan Chen, Yu Wu 0012, Zhuo Chen 0006, Jian Wu 0027, Jinyu Li 0001, Takuya Yoshioka, Chengyi Wang 0002, Shujie Liu 0001, Ming Zhou 0001, 
Continuous Speech Separation with Conformer.
ICASSP2022
Yuan Gong, Ziyi Chen, Iek-Heng Chu, Peng Chang 0002, James R. Glass, 
Transformer-Based Multi-Aspect Multi-Granularity Non-Native English Speaker Pronunciation Assessment.
ICASSP2022
Yuan Gong, Jin Yu, James R. Glass, 
Vocalsound: A Dataset for Improving Human Vocal Sounds Recognition.
ICASSP2022
Sameer Khurana, Antoine Laurent, James R. Glass, 
Magic Dust for Cross-Lingual Adaptation of Monolingual Wav2vec-2.0.
ICASSP2022
Cheng-I Jeff Lai, Erica Cooper, Yang Zhang 0001, Shiyu Chang, Kaizhi Qian, Yi-Lun Liao, Yung-Sung Chuang, Alexander H. Liu, Junichi Yamagishi, David D. Cox, James R. Glass, 
On the Interplay between Sparsity, Naturalness, Intelligibility, and Prosody in Speech Synthesis.
Interspeech2022
Alexander H. Liu, Cheng-I Lai, Wei-Ning Hsu, Michael Auli, Alexei Baevski, James R. Glass, 
Simple and Effective Unsupervised Speech Synthesis.
AAAI2022
Yuan Gong, Cheng-I Lai, Yu-An Chung, James R. Glass, 
SSAST: Self-Supervised Audio Spectrogram Transformer.
TASLP2021
Yuan Gong, Yu-An Chung, James R. Glass, 
PSLA: Improving Audio Tagging With Pretraining, Sampling, Labeling, and Aggregation.
ICASSP2021
Yu-An Chung, Yonatan Belinkov, James R. Glass, 
Similarity Analysis of Self-Supervised Speech Representations.
ICASSP2021
Cheng-I Lai, Yung-Sung Chuang, Hung-Yi Lee, Shang-Wen Li 0001, James R. Glass, 
Semi-Supervised Spoken Language Understanding via Self-Supervised Speech and Language Model Pretraining.
Interspeech2021
Yuan Gong, Yu-An Chung, James R. Glass, 
AST: Audio Spectrogram Transformer.
Interspeech2021
R'mani Haulcy, James R. Glass, 
CLAC: A Speech Corpus of Healthy English Speakers.
Interspeech2021
Alexander H. Liu, Yu-An Chung, James R. Glass, 
Non-Autoregressive Predictive Coding for Learning Speech Representations from Local Dependencies.
Interspeech2021
Hongyin Luo, James R. Glass, Garima Lalwani, Yi Zhang, Shang-Wen Li 0001, 
Joint Retrieval-Extraction Training for Evidence-Aware Dialog Response Selection.
Interspeech2021
Ian Palmer, Andrew Rouditchenko, Andrei Barbu, Boris Katz, James R. Glass, 
Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset.
Interspeech2021
Andrew Rouditchenko, Angie W. Boggust, David Harwath, Samuel Thomas 0001, Hilde Kuehne, Brian Chen, Rameswar Panda, Rogério Feris, Brian Kingsbury, Michael Picheny, James R. Glass, 
Cascaded Multilingual Audio-Visual Learning from Videos.
Interspeech2021
Andrew Rouditchenko, Angie W. Boggust, David Harwath, Brian Chen, Dhiraj Joshi, Samuel Thomas 0001, Kartik Audhkhasi, Hilde Kuehne, Rameswar Panda, Rogério Schmidt Feris, Brian Kingsbury, Michael Picheny, Antonio Torralba 0001, James R. Glass, 
AVLnet: Learning Audio-Visual Language Representations from Instructional Videos.
NeurIPS2021
Cheng-I Jeff Lai, Yang Zhang 0001, Alexander H. Liu, Shiyu Chang, Yi-Lun Liao, Yung-Sung Chuang, Kaizhi Qian, Sameer Khurana, David D. Cox, Jim Glass, 
PARP: Prune, Adjust and Re-Prune for Self-Supervised Speech Recognition.
ACL2021
Wei-Ning Hsu, David Harwath, Tyler Miller, Christopher Song, James R. Glass, 
Text-Free Image-to-Speech Synthesis Using Learned Segmental Units.
ICASSP2020
Jennifer Drexler, James R. Glass, 
Learning a Subword Inventory Jointly with End-to-End Automatic Speech Recognition.
ICASSP2020
Yasunori Ohishi, Akisato Kimura, Takahito Kawanishi, Kunio Kashino, David Harwath, James R. Glass, 
Trilingual Semantic Embeddings of Visually Grounded Speech with Self-Attention Mechanisms.
SpeechComm2022
Chiranjeevi Yarra, Prasanta Kumar Ghosh, 
Automatic syllable stress detection under non-parallel label and data condition.
ICASSP2022
Aravind Illa, Aanish Nair, Prasanta Kumar Ghosh, 
The impact of cross language on acoustic-to-articulatory inversion and its influence on articulatory speech synthesis.
ICASSP2022
Abinay Reddy Naini, Bhavuk Singhal, Prasanta Kumar Ghosh, 
Dual Attention Pooling Network for Recording Device Classification Using Neutral and Whispered Speech.
ICASSP2022
Anwesha Roy, Varun Belagali, Prasanta Kumar Ghosh, 
An Error Correction Scheme for Improved Air-Tissue Boundary in Real-Time MRI Video for Speech Production.
Interspeech2022
Anish Bhanushali, Grant Bridgman, Deekshitha G, Prasanta Kumar Ghosh, Pratik Kumar, Saurabh Kumar, Adithya Raj Kolladath, Nithya Ravi, Aaditeshwar Seth, Ashish Seth, Abhayjeet Singh, Vrunda N. Sukhadia, Srinivasan Umesh, Sathvik Udupa, Lodagala V. S. V. Durga Prasad, 
Gram Vaani ASR Challenge on spontaneous telephone speech recordings in regional variations of Hindi.
Interspeech2022
Anwesha Roy, Varun Belagali, Prasanta Kumar Ghosh, 
Air tissue boundary segmentation using regional loss in real-time Magnetic Resonance Imaging video for speech production.
Interspeech2022
C. Siddarth, Sathvik Udupa, Prasanta Kumar Ghosh, 
Watch Me Speak: 2D Visualization of Human Mouth during Speech.
Interspeech2022
Sathvik Udupa, Aravind Illa, Prasanta Kumar Ghosh, 
Streaming model for Acoustic to Articulatory Inversion with transformer networks.
ICASSP2021
Tanuka Bhattacharjee, Jhansi Mallela, Yamini Belur, Nalini Atchayarcmf, Ravi Yadav, Pradeep Reddy, Dipanjan Gope, Prasanta Kumar Ghosh, 
Effect of Noise and Model Complexity on Detection of Amyotrophic Lateral Sclerosis and Parkinson's Disease Using Pitch and MFCC.
ICASSP2021
Sarthak Kumar Maharana, Aravind Illa, Renuka Mannem, Yamini Belur, Preetie Shetty, Preethish-Kumar Veeramani, Seena Vengalil, Kiran Polavarapu, Atchayaram Nalini, Prasanta Kumar Ghosh, 
Acoustic-to-Articulatory Inversion for Dysarthric Speech by Using Cross-Corpus Acoustic-Articulatory Data.
ICASSP2021
Tilak Purohit, Achuth Rao M. V, Prasanta Kumar Ghosh, 
Impact of Speaking Rate on the Source Filter Interaction in Speech: A Study.
Interspeech2021
Tanuka Bhattacharjee, Jhansi Mallela, Yamini Belur, Atchayaram Nalini, Ravi Yadav, Pradeep Reddy, Dipanjan Gope, Prasanta Kumar Ghosh, 
Source and Vocal Tract Cues for Speech-Based Classification of Patients with Parkinson's Disease and Healthy Subjects.
Interspeech2021
Anuj Diwan, Rakesh Vaideeswaran, Sanket Shah, Ankita Singh, Srinivasa Raghavan K. M., Shreya Khare, Vinit Unni, Saurabh Vyas, Akash Rajpuria, Chiranjeevi Yarra, Ashish R. Mittal, Prasanta Kumar Ghosh, Preethi Jyothi, Kalika Bali, Vivek Seshadri, Sunayana Sitaram, Samarth Bharadwaj, Jai Nanavati, Raoul Nanavati, Karthik Sankaranarayanan, 
MUCS 2021: Multilingual and Code-Switching ASR Challenges for Low Resource Indian Languages.
Interspeech2021
Ananya Muguli, Lancelot Pinto, Nirmala R., Neeraj Kumar Sharma 0001, Prashant Krishnan V, Prasanta Kumar Ghosh, Rohit Kumar, Shrirama Bhat, Srikanth Raj Chetupalli, Sriram Ganapathy, Shreyas Ramoji, Viral Nanda, 
DiCOVA Challenge: Dataset, Task, and Baseline System for COVID-19 Diagnosis Using Acoustics.
Interspeech2021
Manthan Sharma, Navaneetha Gaddam, Tejas Umesh, Aditya Murthy, Prasanta Kumar Ghosh, 
A Comparative Study of Different EMG Features for Acoustics-to-EMG Mapping.
Interspeech2021
Sathvik Udupa, Anwesha Roy, Abhayjeet Singh, Aravind Illa, Prasanta Kumar Ghosh, 
Estimating Articulatory Movements in Speech Production with Transformer Networks.
Interspeech2021
Sathvik Udupa, Anwesha Roy, Abhayjeet Singh, Aravind Illa, Prasanta Kumar Ghosh, 
Web Interface for Estimating Articulatory Movements in Speech Production from Acoustics and Text.
Interspeech2021
Chiranjeevi Yarra, Prasanta Kumar Ghosh, 
Noise Robust Pitch Stylization Using Minimum Mean Absolute Error Criterion.
ICASSP2020
Jhansi Mallela, Aravind Illa, Suhas B. N., Sathvik Udupa, Yamini Belur, Atchayaram Nalini, Ravi Yadav, Pradeep Reddy, Dipanjan Gope, Prasanta Kumar Ghosh, 
Voice based classification of patients with Amyotrophic Lateral Sclerosis, Parkinson's Disease and Healthy Controls with CNN-LSTM using transfer learning.
ICASSP2020
Avni Rajpal, M. V. Achuth Rao, Chiranjeevi Yarra, Ritu Aggarwal, Prasanta Kumar Ghosh, 
Pseudo Likelihood Correction Technique for Low Resource Accented ASR.
SpeechComm2023
Jiangyan Yi, Jianhua Tao, Ye Bai, Zhengkun Tian, Cunhang Fan, 
Transfer knowledge for punctuation prediction via adversarial training.
SpeechComm2022
Wenhuan Lu, Xinyue Zhao, Na Guo, Yongwei Li, Jianguo Wei, Jianhua Tao, Jianwu Dang 0001, 
One-shot emotional voice conversion based on feature separation.
TASLP2022
Tao Wang 0074, Ruibo Fu, Jiangyan Yi, Jianhua Tao, Zhengqi Wen, 
NeuralDPS: Neural Deterministic Plus Stochastic Model With Multiband Excitation for Noise-Controllable Waveform Generation.
TASLP2022
Tao Wang 0074, Jiangyan Yi, Ruibo Fu, Jianhua Tao, Zhengqi Wen, 
CampNet: Context-Aware Mask Prediction for End-to-End Text-Based Speech Editing.
ICASSP2022
Cong Cai, Bin Liu 0041, Jianhua Tao, Zhengkun Tian, Jiahao Lu, Kexin Wang, 
End-to-End Network Based on Transformer for Automatic Detection of Covid-19.
ICASSP2022
Ya Li, Mingyue Niu, Ziping Zhao 0001, Jianhua Tao, 
Automatic Depression Level Assessment from Speech By Long-Term Global Information Embedding.
ICASSP2022
Tao Wang 0074, Jiangyan Yi, Liqun Deng, Ruibo Fu, Jianhua Tao, Zhengqi Wen, 
Context-Aware Mask Prediction Network for End-to-End Text-Based Speech Editing.
Interspeech2022
Shuai Zhang 0014, Jiangyan Yi, Zhengkun Tian, Jianhua Tao, Yu Ting Yeung, Liqun Deng, 
reducing multilingual context confusion for end-to-end code-switching automatic speech recognition.
Interspeech2022
Jiahui Pan, Shuai Nie, Hui Zhang 0031, Shulin He, Kanghao Zhang, Shan Liang, Xueliang Zhang 0001, Jianhua Tao, 
Speaker recognition-assisted robust audio deepfake detection.
SpeechComm2021
Shan Liang, Guanjun Li, Shuai Nie, Zhanlei Yang, Wenju Liu, Jianhua Tao, 
Exploiting the directional coherence function for multichannel source extraction.
TASLP2021
Ye Bai, Jiangyan Yi, Jianhua Tao, Zhengkun Tian, Zhengqi Wen, Shuai Zhang 0014, 
Fast End-to-End Speech Recognition Via Non-Autoregressive Models and Cross-Modal Knowledge Transferring From BERT.
TASLP2021
Ye Bai, Jiangyan Yi, Jianhua Tao, Zhengqi Wen, Zhengkun Tian, Shuai Zhang 0014, 
Integrating Knowledge Into End-to-End Speech Recognition From External Text-Only Data.
TASLP2021
Cunhang Fan, Jiangyan Yi, Jianhua Tao, Zhengkun Tian, Bin Liu 0041, Zhengqi Wen, 
Gated Recurrent Fusion With Joint Training Framework for Robust End-to-End Speech Recognition.
TASLP2021
Yongwei Li, Jianhua Tao, Donna Erickson, Bin Liu 0041, Masato Akagi, 
$F_0$-Noise-Robust Glottal Source and Vocal Tract Analysis Based on ARX-LF Model.
TASLP2021
Zheng Lian, Bin Liu 0041, Jianhua Tao, 
CTNet: Conversational Transformer Network for Emotion Recognition.
ICASSP2021
Shuai Zhang 0014, Jiangyan Yi, Zhengkun Tian, Ye Bai, Jianhua Tao, Zhengqi Wen, 
Decoupling Pronunciation and Language for End-to-End Code-Switching Automatic Speech Recognition.
ICASSP2021
Ruibo Fu, Jianhua Tao, Zhengqi Wen, Jiangyan Yi, Tao Wang 0074, Chunyu Qiang, 
Bi-Level Style and Prosody Decoupling Modeling for Personalized End-to-End Speech Synthesis.
ICASSP2021
Licai Sun, Bin Liu 0041, Jianhua Tao, Zheng Lian, 
Multimodal Cross- and Self-Attention Network for Speech Emotion Recognition.
ICASSP2021
Tao Wang 0074, Ruibo Fu, Jiangyan Yi, Jianhua Tao, Zhengqi Wen, Chunyu Qiang, Shiming Wang, 
Prosody and Voice Factorization for Few-Shot Speaker Adaptation in the Challenge M2voc 2021.
Interspeech2021
Shuai Zhang 0014, Jiangyan Yi, Zhengkun Tian, Ye Bai, Jianhua Tao, Xuefei Liu, Zhengqi Wen, 
End-to-End Spelling Correction Conditioned on Acoustic Feature for Code-Switching Speech Recognition.
SpeechComm2022
Takuma Okamoto, Keisuke Matsubara, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai, 
Neural speech-rate conversion with multispeaker WaveNet vocoder.
ICASSP2022
Erica Cooper, Wen-Chin Huang, Tomoki Toda, Junichi Yamagishi, 
Generalization Ability of MOS Prediction Networks.
ICASSP2022
Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda, 
An Investigation of Streaming Non-Autoregressive sequence-to-sequence Voice Conversion.
ICASSP2022
Wen-Chin Huang, Erica Cooper, Junichi Yamagishi, Tomoki Toda, 
LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech.
ICASSP2022
Wen-Chin Huang, Shu-Wen Yang, Tomoki Hayashi, Hung-Yi Lee, Shinji Watanabe 0001, Tomoki Toda, 
S3PRL-VC: Open-Source Voice Conversion Framework with Self-Supervised Speech Representations.
ICASSP2022
Chao Xie, Yi-Chiao Wu, Patrick Lumban Tobing, Wen-Chin Huang, Tomoki Toda, 
Direct Noisy Speech Modeling for Noisy-To-Noisy Voice Conversion.
Interspeech2022
Yeonjong Choi, Chao Xie, Tomoki Toda, 
An Evaluation of Three-Stage Voice Conversion Framework for Noisy and Reverberant Conditions.
Interspeech2022
Wen-Chin Huang, Erica Cooper, Yu Tsao 0001, Hsin-Min Wang, Tomoki Toda, Junichi Yamagishi, 
The VoiceMOS Challenge 2022.
Interspeech2022
Lester Phillip Violeta, Wen-Chin Huang, Tomoki Toda, 
Investigating Self-supervised Pretraining Frameworks for Pathological Speech Recognition.
Interspeech2022
Reo Yoneyama, Yi-Chiao Wu, Tomoki Toda, 
Unified Source-Filter GAN with Harmonic-plus-Noise Source Excitation Generation.
Interspeech2022
Daiki Yoshioka, Yusuke Yasuda, Noriyuki Matsunaga, Yamato Ohtani, Tomoki Toda, 
Spoken-Text-Style Transfer with Conditional Variational Autoencoder and Content Word Storage.
TASLP2021
Wen-Chin Huang, Tomoki Hayashi, Yi-Chiao Wu, Hirokazu Kameoka, Tomoki Toda, 
Pretraining Techniques for Sequence-to-Sequence Voice Conversion.
TASLP2021
Yi-Chiao Wu, Tomoki Hayashi, Takuma Okamoto, Hisashi Kawai, Tomoki Toda, 
Quasi-Periodic Parallel WaveGAN: A Non-Autoregressive Raw Waveform Generative Model With Pitch-Dependent Dilated Convolution Neural Network.
TASLP2021
Yi-Chiao Wu, Tomoki Hayashi, Patrick Lumban Tobing, Kazuhiro Kobayashi, Tomoki Toda, 
Quasi-Periodic WaveNet: An Autoregressive Raw Waveform Generative Model With Pitch-Dependent Dilated Convolution Neural Network.
ICASSP2021
Atsushi Ando, Ryo Masumura, Hiroshi Sato, Takafumi Moriya, Takanori Ashihara, Yusuke Ijima, Tomoki Toda, 
Speech Emotion Recognition Based on Listener Adaptive Models.
ICASSP2021
Wen-Chin Huang, Chia-Hua Wu, Shang-Bao Luo, Kuan-Yu Chen, Hsin-Min Wang, Tomoki Toda, 
Speech Recognition by Simply Fine-Tuning Bert.
ICASSP2021
Kazuhiro Kobayashi, Wen-Chin Huang, Yi-Chiao Wu, Patrick Lumban Tobing, Tomoki Hayashi, Tomoki Toda, 
Crank: An Open-Source Software for Nonparallel Voice Conversion Based on Vector-Quantized Variational Autoencoder.
ICASSP2021
Keisuke Matsubara, Takuma Okamoto, Ryoichi Takashima, Tetsuya Takiguchi, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai, 
High-Intelligibility Speech Synthesis for Dysarthric Speakers with LPCNet-Based TTS and CycleVAE-Based VC.
ICASSP2021
Takuma Okamoto, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai, 
Noise Level Limited Sub-Modeling for Diffusion Probabilistic Vocoders.
Interspeech2021
Wen-Chin Huang, Kazuhiro Kobayashi, Yu-Huai Peng, Ching-Feng Liu, Yu Tsao 0001, Hsin-Min Wang, Tomoki Toda, 
A Preliminary Study of a Two-Stage Paradigm for Preserving Speaker Identity in Dysarthric Voice Conversion.
ICASSP2022
Han Chen, Yan Song 0001, Li-Rong Dai 0001, Ian McLoughlin 0001, Lin Liu, 
Self-Supervised Representation Learning for Unsupervised Anomalous Sound Detection Under Domain Shift.
ICASSP2022
Xing-Yu Chen, Qiu-Shi Zhu, Jie Zhang 0042, Li-Rong Dai 0001, 
Supervised and Self-Supervised Pretraining Based Covid-19 Detection Using Acoustic Breathing/Cough/Speech Signals.
ICASSP2022
Hang-Rui Hu, Yan Song 0001, Ying Liu, Li-Rong Dai 0001, Ian McLoughlin 0001, Lin Liu, 
Domain Robust Deep Embedding Learning for Speaker Recognition.
ICASSP2022
Yuxuan Xi, Yan Song 0001, Li-Rong Dai 0001, Ian McLoughlin 0001, Lin Liu, 
Frontend Attributes Disentanglement for Speech Emotion Recognition.
ICASSP2022
Qiu-Shi Zhu, Jie Zhang 0042, Zi-qiang Zhang, Ming-Hui Wu, Xin Fang, Li-Rong Dai 0001, 
A Noise-Robust Self-Supervised Pre-Training Model Based Speech Representation Learning for Automatic Speech Recognition.
Interspeech2022
Junyi Ao, Ziqiang Zhang, Long Zhou, Shujie Liu 0001, Haizhou Li 0001, Tom Ko, Lirong Dai 0001, Jinyu Li 0001, Yao Qian, Furu Wei, 
Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data.
Interspeech2022
Ye-Qian Du, Jie Zhang 0042, Qiu-Shi Zhu, Lirong Dai 0001, Ming-Hui Wu, Xin Fang, Zhou-Wang Yang, 
A Complementary Joint Training Approach Using Unpaired Speech and Text A Complementary Joint Training Approach Using Unpaired Speech and Text.
Interspeech2022
Hang-Rui Hu, Yan Song 0001, Li-Rong Dai 0001, Ian McLoughlin 0001, Lin Liu, 
Class-Aware Distribution Alignment based Unsupervised Domain Adaptation for Speaker Verification.
Interspeech2022
Hai-tao Xu, Jie Zhang, Li-Rong Dai 0001, 
Differential Time-frequency Log-mel Spectrogram Features for Vision Transformer Based Infant Cry Recognition.
TASLP2021
Jian Tang, Jie Zhang 0042, Yan Song 0001, Ian McLoughlin 0001, Li-Rong Dai 0001, 
Multi-Granularity Sequence Alignment Mapping for Encoder-Decoder Based End-to-End ASR.
TASLP2021
Xiao Zhou, Zhen-Hua Ling, Li-Rong Dai 0001, 
UnitNet: A Sequence-to-Sequence Acoustic Model for Concatenative Speech Synthesis.
ICASSP2021
Ying Liu, Yan Song 0001, Ian McLoughlin 0001, Lin Liu, Li-Rong Dai 0001, 
An Effective Deep Embedding Learning Method Based on Dense-Residual Networks for Speaker Verification.
Interspeech2021
Hang Chen, Jun Du, Yu Hu 0003, Li-Rong Dai 0001, Bao-Cai Yin, Chin-Hui Lee, 
Automatic Lip-Reading with Hierarchical Pyramidal Convolution and Self-Attention for Image Sequences with No Word Boundaries.
Interspeech2021
Hui Wang, Lin Liu, Yan Song 0001, Lei Fang, Ian McLoughlin 0001, Li-Rong Dai 0001, 
A Weight Moving Average Based Alternate Decoupled Learning Algorithm for Long-Tailed Language Identification.
Interspeech2021
Xu Zheng, Yan Song 0001, Li-Rong Dai 0001, Ian McLoughlin 0001, Lin Liu, 
An Effective Mutual Mean Teaching Based Domain Adaptation Method for Sound Event Detection.
Interspeech2021
Xiao Zhou, Zhen-Hua Ling, Li-Rong Dai 0001, 
UnitNet-Based Hybrid Speech Synthesis.
Interspeech2021
Qiu-Shi Zhu, Jie Zhang 0042, Ming-Hui Wu, Xin Fang, Li-Rong Dai 0001, 
An Improved Wav2Vec 2.0 Pre-Training Approach Using Enhanced Local Dependency Modeling for Speech Recognition.
AAAI2021
Jing-Xuan Zhang, Korin Richmond, Zhen-Hua Ling, Lirong Dai 0001, 
TaLNet: Voice Reconstruction from Tongue and Lip Articulation with Transfer Learning from Text-to-Speech Synthesis.
TASLP2020
Jing-Xuan Zhang, Zhen-Hua Ling, Li-Rong Dai 0001, 
Non-Parallel Sequence-to-Sequence Voice Conversion With Disentangled Linguistic and Speaker Representations.
ICASSP2020
Fenglin Ding, Wu Guo, Lirong Dai 0001, Jun Du, 
Attention-Based Gated Scaling Adaptive Acoustic Model for CTC-Based Speech Recognition.
TASLP2022
Haibin Wu, Xu Li, Andy T. Liu, Zhiyong Wu 0001, Helen Meng, Hung-Yi Lee, 
Improving the Adversarial Robustness for Speaker Verification by Self-Supervised Learning.
ICASSP2022
Xueyuan Chen, Changhe Song, Yixuan Zhou, Zhiyong Wu 0001, Changbin Chen, Zhongqin Wu, Helen Meng, 
A Character-Level Span-Based Model for Mandarin Prosodic Structure Prediction.
ICASSP2022
Liyang Chen, Zhiyong Wu 0001, Jun Ling, Runnan Li, Xu Tan 0003, Sheng Zhao, 
Transformer-S2A: Robust and Efficient Speech-to-Animation.
ICASSP2022
Shun Lei, Yixuan Zhou, Liyang Chen, Zhiyong Wu 0001, Shiyin Kang, Helen Meng, 
Towards Expressive Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis.
ICASSP2022
Jingbei Li, Yi Meng, Chenyi Li, Zhiyong Wu 0001, Helen Meng, Chao Weng, Dan Su 0002, 
Enhancing Speaking Styles in Conversational Text-to-Speech Synthesis with Graph-Based Multi-Modal Context Modeling.
ICASSP2022
Jingbei Li, Yi Meng, Zhiyong Wu 0001, Helen Meng, Qiao Tian, Yuping Wang, Yuxuan Wang 0002, 
Neufa: Neural Network Based End-to-End Forced Alignment with Bidirectional Attention Mechanism.
ICASSP2022
Haibin Wu, Po-Chun Hsu, Ji Gao, Shanshan Zhang, Shen Huang, Jian Kang 0006, Zhiyong Wu 0001, Helen Meng, Hung-Yi Lee, 
Adversarial Sample Detection for Speaker Verification by Neural Vocoders.
ICASSP2022
Xixin Wu, Shoukang Hu, Zhiyong Wu 0001, Xunying Liu, Helen Meng, 
Neural Architecture Search for Speech Emotion Recognition.
ICASSP2022
Wenxuan Ye, Shaoguang Mao, Frank K. Soong, Wenshan Wu, Yan Xia 0005, Jonathan Tien, Zhiyong Wu 0001, 
An Approach to Mispronunciation Detection and Diagnosis with Acoustic, Phonetic and Linguistic (APL) Embeddings.
ICASSP2022
Xintao Zhao, Feng Liu, Changhe Song, Zhiyong Wu 0001, Shiyin Kang, Deyi Tuo, Helen Meng, 
Disentangling Content and Fine-Grained Prosody Information Via Hybrid ASR Bottleneck Features for Voice Conversion.
Interspeech2022
Jun Chen, Wei Rao, Zilin Wang, Zhiyong Wu 0001, Yannan Wang, Tao Yu, Shidong Shang, Helen Meng, 
Speech Enhancement with Fullband-Subband Cross-Attention Network.
Interspeech2022
Jie Chen, Changhe Song, Deyi Tuo, Xixin Wu, Shiyin Kang, Zhiyong Wu 0001, Helen Meng, 
Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information.
Interspeech2022
Shun Lei, Yixuan Zhou, Liyang Chen, Jiankun Hu, Zhiyong Wu 0001, Shiyin Kang, Helen Meng, 
Towards Multi-Scale Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis.
Interspeech2022
Xiang Li, Changhe Song, Xianhao Wei, Zhiyong Wu 0001, Jia Jia 0001, Helen Meng, 
Towards Cross-speaker Reading Style Transfer on Audiobook Dataset.
Interspeech2022
Yi Meng, Xiang Li, Zhiyong Wu 0001, Tingtian Li, Zixun Sun, Xinyu Xiao, Chi Sun, Hui Zhan, Helen Meng, 
CALM: Constrastive Cross-modal Speaking Style Modeling for Expressive Text-to-Speech Synthesis.
Interspeech2022
Sicheng Yang, Methawee Tantrawenith, Haolin Zhuang, Zhiyong Wu 0001, Aolan Sun, Jianzong Wang, Ning Cheng 0001, Huaizhen Tang, Xintao Zhao, Jie Wang, Helen Meng, 
Speech Representation Disentanglement with Adversarial Mutual Information Learning for One-shot Voice Conversion.
Interspeech2022
Yang Zhang, Zhiqiang Lv, Haibin Wu, Shanshan Zhang, Pengfei Hu, Zhiyong Wu 0001, Hung-yi Lee, Helen Meng, 
MFA-Conformer: Multi-scale Feature Aggregation Conformer for Automatic Speaker Verification.
Interspeech2022
Shaohuan Zhou, Shun Lei, Weiya You, Deyi Tuo, Yuren You, Zhiyong Wu 0001, Shiyin Kang, Helen Meng, 
Towards Improving the Expressiveness of Singing Voice Synthesis with BERT Derived Semantic Information.
Interspeech2022
Yixuan Zhou, Changhe Song, Xiang Li, Luwen Zhang, Zhiyong Wu 0001, Yanyao Bian, Dan Su 0002, Helen Meng, 
Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synthesis.
TASLP2021
Xixin Wu, Yuewen Cao, Hui Lu, Songxiang Liu, Shiyin Kang, Zhiyong Wu 0001, Xunying Liu, Helen Meng, 
Exemplar-Based Emotive Speech Synthesis.
TASLP2022
Shang-Yi Chuang, Hsin-Min Wang, Yu Tsao 0001, 
Improved Lite Audio-Visual Speech Enhancement.
TASLP2022
Yu-Chen Lin, Cheng Yu, Yi-Te Hsu, Szu-Wei Fu, Yu Tsao 0001, Tei-Wei Kuo, 
SEOFP-NET: Compression and Acceleration of Deep Neural Networks for Speech Enhancement Using Sign-Exponent-Only Floating-Points.
ICASSP2022
Szu-Wei Fu, Cheng Yu, Kuo-Hsuan Hung, Mirco Ravanelli, Yu Tsao 0001, 
MetricGAN-U: Unsupervised Speech Enhancement/ Dereverberation Based Only on Noisy/ Reverberated Speech.
ICASSP2022
Kuan-Chen Wang, Kai-Chun Liu, Hsin-Min Wang, Yu Tsao 0001, 
EMGSE: Acoustic/EMG Fusion for Multimodal Speech Enhancement.
ICASSP2022
Haibin Wu, Heng-Cheng Kuo, Naijun Zheng, Kuo-Hsuan Hung, Hung-Yi Lee, Yu Tsao 0001, Hsin-Min Wang, Helen Meng, 
Partially Fake Audio Detection by Self-Attention-Based Fake Span Discovery.
ICASSP2022
Chao-Han Huck Yang, Jun Qi 0002, Samuel Yen-Chi Chen, Yu Tsao 0001, Pin-Yu Chen, 
When BERT Meets Quantum Temporal Convolution Learning for Text Classification in Heterogeneous Computing.
Interspeech2022
Rong Chao, Cheng Yu, Szu-Wei Fu, Xugang Lu, Yu Tsao 0001, 
Perceptual Contrast Stretching on Target Feature for Speech Enhancement.
Interspeech2022
Yu-Wen Chen, Yu Tsao 0001, 
InQSS: a speech intelligibility and quality assessment model using a multi-task learning network.
Interspeech2022
Wen-Chin Huang, Erica Cooper, Yu Tsao 0001, Hsin-Min Wang, Tomoki Toda, Junichi Yamagishi, 
The VoiceMOS Challenge 2022.
Interspeech2022
Kuo-Hsuan Hung, Szu-Wei Fu, Huan-Hsin Tseng, Hsin-Tien Chiang, Yu Tsao 0001, Chii-Wann Lin, 
Boosting Self-Supervised Embeddings for Speech Enhancement.
Interspeech2022
Chi-Chang Lee, Cheng-Hung Hu, Yu-Chen Lin, Chu-Song Chen, Hsin-Min Wang, Yu Tsao 0001, 
NASTAR: Noise Adaptive Speech Enhancement with Target-Conditional Resampling.
Interspeech2022
Yen-Ju Lu, Xuankai Chang, Chenda Li, Wangyou Zhang, Samuele Cornell, Zhaoheng Ni, Yoshiki Masuyama, Brian Yan, Robin Scheibler, Zhong-Qiu Wang, Yu Tsao 0001, Yanmin Qian, Shinji Watanabe 0001, 
ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding.
Interspeech2022
Chiang-Jen Peng, Yun-Ju Chan, Yih-Liang Shen, Cheng Yu, Yu Tsao 0001, Tai-Shih Chi, 
Perceptual Characteristics Based Multi-objective Model for Speech Enhancement.
Interspeech2022
Fan-Lin Wang, Hung-Shin Lee, Yu Tsao 0001, Hsin-Min Wang, 
Disentangling the Impacts of Language and Channel Variability on Speech Separation Networks.
Interspeech2022
Cheng Yu, Szu-Wei Fu, Tsun-An Hsieh, Yu Tsao 0001, Mirco Ravanelli, 
OSSEM: one-shot speaker adaptive speech enhancement using meta learning.
Interspeech2022
Ryandhimas Edo Zezario, Fei Chen 0011, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao 0001, 
MBI-Net: A Non-Intrusive Multi-Branched Speech Intelligibility Prediction Model for Hearing Aids.
Interspeech2022
Ryandhimas Edo Zezario, Szu-Wei Fu, Fei Chen 0011, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao 0001, 
MTI-Net: A Multi-Target Speech Intelligibility Prediction Model.
TASLP2021
Xugang Lu, Peng Shen, Yu Tsao 0001, Hisashi Kawai, 
Coupling a Generative Model With a Discriminative Learning Framework for Speaker Verification.
ICASSP2021
Xugang Lu, Peng Shen, Yu Tsao 0001, Hisashi Kawai, 
Unsupervised Neural Adaptation Model Based on Optimal Transport for Spoken Language Identification.
ICASSP2021
Yuan-Kuei Wu, Kuan-Po Huang, Yu Tsao 0001, Hung-yi Lee, 
One Shot Learning for Speech Separation.
ICASSP2022
Ke Hu, Tara N. Sainath, Arun Narayanan, Ruoming Pang, Trevor Strohman, 
Transducer-Based Streaming Deliberation for Cascaded Encoders.
ICASSP2022
Tara N. Sainath, Yanzhang He, Arun Narayanan, Rami Botros, Weiran Wang, David Qiu, Chung-Cheng Chiu, Rohit Prabhavalkar, Alexander Gruenstein, Anmol Gulati, Bo Li 0028, David Rybach, Emmanuel Guzman, Ian McGraw, James Qin, Krzysztof Choromanski, Qiao Liang, Robert David, Ruoming Pang, Shuo-Yiin Chang, Trevor Strohman, W. Ronny Huang, Wei Han 0002, Yonghui Wu, Yu Zhang 0033, 
Improving The Latency And Quality Of Cascaded Encoders.
ICASSP2022
Chao Zhang, Bo Li 0028, Zhiyun Lu, Tara N. Sainath, Shuo-Yiin Chang, 
Improving the Fusion of Acoustic and Text Representations in RNN-T.
Interspeech2022
Shuo-Yiin Chang, Bo Li 0028, Tara N. Sainath, Chao Zhang, Trevor Strohman, Qiao Liang, Yanzhang He, 
Turn-Taking Prediction for Natural Conversational Speech.
Interspeech2022
Shuo-Yiin Chang, Guru Prakash, Zelin Wu, Tara N. Sainath, Bo Li 0028, Qiao Liang, Adam Stambler, Shyam Upadhyay, Manaal Faruqui, Trevor Strohman, 
Streaming Intended Query Detection using E2E Modeling for Continued Conversation.
Interspeech2022
Shaojin Ding, Weiran Wang, Ding Zhao, Tara N. Sainath, Yanzhang He, Robert David, Rami Botros, Xin Wang 0016, Rina Panigrahy, Qiao Liang, Dongseong Hwang, Ian McGraw, Rohit Prabhavalkar, Trevor Strohman, 
A Unified Cascaded Encoder ASR Model for Dynamic Model Sizes.
Interspeech2022
Ke Hu, Tara N. Sainath, Yanzhang He, Rohit Prabhavalkar, Trevor Strohman, Sepand Mavandadi, Weiran Wang, 
Improving Deliberation by Text-Only and Semi-Supervised Training.
Interspeech2022
W. Ronny Huang, Shuo-Yiin Chang, David Rybach, Tara N. Sainath, Rohit Prabhavalkar, Cal Peyser, Zhiyun Lu, Cyril Allauzen, 
E2E Segmenter: Joint Segmenting and Decoding for Long-Form ASR.
Interspeech2022
W. Ronny Huang, Cal Peyser, Tara N. Sainath, Ruoming Pang, Trevor D. Strohman, Shankar Kumar, 
Sentence-Select: Large-Scale Language Model Data Selection for Rare-Word Speech Recognition.
Interspeech2022
Bo Li 0028, Tara N. Sainath, Ruoming Pang, Shuo-Yiin Chang, Qiumin Xu, Trevor Strohman, Vince Chen, Qiao Liang, Heguang Liu, Yanzhang He, Parisa Haghani, Sameer Bidichandani, 
A Language Agnostic Multilingual Streaming On-Device ASR System.
Interspeech2022
Cal Peyser, W. Ronny Huang, Andrew Rosenberg, Tara N. Sainath, Michael Picheny, Kyunghyun Cho, 
Towards Disentangled Speech Representations.
Interspeech2022
Weiran Wang, Tongzhou Chen, Tara N. Sainath, Ehsan Variani, Rohit Prabhavalkar, W. Ronny Huang, Bhuvana Ramabhadran, Neeraj Gaur, Sepand Mavandadi, Cal Peyser, Trevor Strohman, Yanzhang He, David Rybach, 
Improving Rare Word Recognition with LM-aware MWER Training.
Interspeech2022
Weiran Wang, Ke Hu, Tara N. Sainath, 
Streaming Align-Refine for Non-autoregressive Deliberation.
Interspeech2022
Chao Zhang, Bo Li 0028, Tara N. Sainath, Trevor Strohman, Sepand Mavandadi, Shuo-Yiin Chang, Parisa Haghani, 
Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification.
ICASSP2021
Bo Li 0028, Anmol Gulati, Jiahui Yu, Tara N. Sainath, Chung-Cheng Chiu, Arun Narayanan, Shuo-Yiin Chang, Ruoming Pang, Yanzhang He, James Qin, Wei Han 0002, Qiao Liang, Yu Zhang 0033, Trevor Strohman, Yonghui Wu, 
A Better and Faster end-to-end Model for Streaming ASR.
ICASSP2021
David Qiu, Qiujia Li, Yanzhang He, Yu Zhang 0033, Bo Li 0028, Liangliang Cao, Rohit Prabhavalkar, Deepti Bhatia, Wei Li 0133, Ke Hu, Tara N. Sainath, Ian McGraw, 
Learning Word-Level Confidence for Subword End-To-End ASR.
ICASSP2021
Harsh Shrivastava 0001, Ankush Garg, Yuan Cao 0007, Yu Zhang 0033, Tara N. Sainath, 
Echo State Speech Recognition.
ICASSP2021
Jiahui Yu, Chung-Cheng Chiu, Bo Li 0028, Shuo-Yiin Chang, Tara N. Sainath, Yanzhang He, Arun Narayanan, Wei Han 0002, Anmol Gulati, Yonghui Wu, Ruoming Pang, 
FastEmit: Low-Latency Streaming ASR with Sequence-Level Emission Regularization.
Interspeech2021
Rami Botros, Tara N. Sainath, Robert David, Emmanuel Guzman, Wei Li 0133, Yanzhang He, 
Tied & Reduced RNN-T Decoder.
Interspeech2021
W. Ronny Huang, Tara N. Sainath, Cal Peyser, Shankar Kumar, David Rybach, Trevor Strohman, 
Lookup-Table Recurrent Language Models for Long Tail Speech Recognition.
SpeechComm2022
Heting Gao, Xiaoxuan Wang, Sunghun Kang, Rusty Mina, Dias Issa, John B. Harvill, Leda Sari, Mark Hasegawa-Johnson, Chang D. Yoo, 
Seamless equal accuracy ratio for inclusive CTC speech recognition.
TASLP2022
Jialu Li, Mark Hasegawa-Johnson, 
Autosegmental Neural Nets 2.0: An Extensive Study of Training Synchronous and Asynchronous Phones and Tones for Under-Resourced Tonal Languages.
ICASSP2022
Chak Ho Chan, Kaizhi Qian, Yang Zhang 0001, Mark Hasegawa-Johnson, 
SpeechSplit2.0: Unsupervised Speech Disentanglement for Voice Conversion without Tuning Autoencoder Bottlenecks.
ICASSP2022
John B. Harvill, Yash R. Wani, Moitreya Chatterjee, Mustafa Alam, David G. Beiser, David Chestek, Mark Hasegawa-Johnson, Narendra Ahuja, 
Detection of Covid-19 from Joint Time and Frequency Analysis of Speech, Breathing and Cough Audio.
Interspeech2022
Heting Gao, Junrui Ni, Kaizhi Qian, Yang Zhang 0001, Shiyu Chang, Mark Hasegawa-Johnson, 
WavPrompt: Towards Few-Shot Spoken Language Understanding with Frozen Language Models.
Interspeech2022
John B. Harvill, Mark Hasegawa-Johnson, Chang D. Yoo, 
Frame-Level Stutter Detection.
Interspeech2022
Mahir Morshed, Mark Hasegawa-Johnson, 
Cross-lingual articulatory feature information transfer for speech recognition using recurrent progressive neural networks.
Interspeech2022
Junrui Ni, Liming Wang, Heting Gao, Kaizhi Qian, Yang Zhang 0001, Shiyu Chang, Mark Hasegawa-Johnson, 
Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition.
ICML2022
Kaizhi Qian, Yang Zhang 0001, Heting Gao, Junrui Ni, Cheng-I Lai, David D. Cox, Mark Hasegawa-Johnson, Shiyu Chang, 
ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers.
ACL2022
Liming Wang, Siyuan Feng, Mark Hasegawa-Johnson, Chang Dong Yoo, 
Self-supervised Semantic-driven Phoneme Discovery for Zero-resource Speech Recognition.
SpeechComm2021
Jialu Li, Mark Hasegawa-Johnson, Nancy L. McElwain, 
Analysis of acoustic and voice quality features for the classification of infant and mother vocalizations.
TASLP2021
Leda Sari, Mark Hasegawa-Johnson, Samuel Thomas 0001, 
Auxiliary Networks for Joint Speaker Adaptation and Speaker Change Detection.
TASLP2021
Leda Sari, Mark Hasegawa-Johnson, Chang D. Yoo, 
Counterfactually Fair Automatic Speech Recognition.
ICASSP2021
Siyuan Feng 0001, Piotr Zelasko, Laureano Moro-Velázquez, Ali Abavisani, Mark Hasegawa-Johnson, Odette Scharenborg, Najim Dehak, 
How Phonotactics Affect Multilingual and Zero-Shot ASR Performance.
ICASSP2021
John B. Harvill, Dias Issa, Mark Hasegawa-Johnson, Chang Dong Yoo, 
Synthesis of New Words for Improved Dysarthric Speech Recognition on an Expanded Vocabulary.
ICASSP2021
Xinsheng Wang, Siyuan Feng 0001, Jihua Zhu, Mark Hasegawa-Johnson, Odette Scharenborg, 
Show and Speak: Directly Synthesize Spoken Description of Images.
ICASSP2021
Junzhe Zhu, Mark Hasegawa-Johnson, Nancy L. McElwain, 
A Comparison Study on Infant-Parent Voice Diarization.
ICASSP2021
Junzhe Zhu, Raymond A. Yeh, Mark Hasegawa-Johnson, 
Multi-Decoder Dprnn: Source Separation for Variable Number of Speakers.
Interspeech2021
Heting Gao, Junrui Ni, Yang Zhang 0001, Kaizhi Qian, Shiyu Chang, Mark Hasegawa-Johnson, 
Zero-Shot Cross-Lingual Phonetic Recognition with External Language Embedding.
Interspeech2021
John B. Harvill, Yash R. Wani, Mark Hasegawa-Johnson, Narendra Ahuja, David G. Beiser, David Chestek, 
Classification of COVID-19 from Cough Using Autoregressive Predictive Coding Pretraining and Spectral Data Augmentation.
TASLP2022
Lucas Ondel, Bolaji Yusuf, Lukás Burget, Murat Saraçlar, 
Non-Parametric Bayesian Subspace Models for Acoustic Unit Discovery.
ICASSP2022
Jiangyu Han, Yanhua Long, Lukás Burget, Jan Cernocký, 
DPCCN: Densely-Connected Pyramid Complex Convolutional Network for Robust Speech Separation and Extraction.
ICASSP2022
Ladislav Mosner, Oldrich Plchot, Lukás Burget, Jan Honza Cernocký, 
Multisv: Dataset for Far-Field Multi-Channel Speaker Verification.
Interspeech2022
Murali Karthick Baskar, Tim Herzig, Diana Nguyen, Mireia Díez, Tim Polzehl, Lukás Burget, Jan Cernocký, 
Speaker adaptation for Wav2vec2 based dysarthric ASR.
Interspeech2022
Niko Brummer, Albert Swart, Ladislav Mosner, Anna Silnova, Oldrich Plchot, Themos Stafylakis, Lukás Burget, 
Probabilistic Spherical Discriminant Analysis: An Alternative to PLDA for length-normalized embeddings.
Interspeech2022
Martin Kocour, Katerina Zmolíková, Lucas Ondel, Jan Svec, Marc Delcroix, Tsubasa Ochiai, Lukás Burget, Jan Cernocký, 
Revisiting joint decoding based multi-talker speech recognition with DNN acoustic model.
Interspeech2022
Federico Landini, Alicia Lozano-Diez, Mireia Díez, Lukás Burget, 
From Simulated Mixtures to Simulated Conversations as Training Data for End-to-End Neural Diarization.
Interspeech2022
Junyi Peng, Rongzhi Gu, Ladislav Mosner, Oldrich Plchot, Lukás Burget, Jan Cernocký, 
Learnable Sparse Filterbank for Speaker Verification.
Interspeech2022
Themos Stafylakis, Ladislav Mosner, Oldrich Plchot, Johan Rohdin, Anna Silnova, Lukás Burget, Jan Cernocký, 
Training speaker embedding extractors using multi-speaker audio with unknown speaker boundaries.
ICASSP2021
Murali Karthick Baskar, Lukás Burget, Shinji Watanabe 0001, Ramón Fernandez Astudillo, Jan Honza Cernocký, 
Eat: Enhanced ASR-TTS for Self-Supervised Speech Recognition.
ICASSP2021
Federico Landini, Ondrej Glembek, Pavel Matejka, Johan Rohdin, Lukás Burget, Mireia Díez, Anna Silnova, 
Analysis of the but Diarization System for Voxconverse Challenge.
ICASSP2021
Hari Krishna Vydana, Martin Karafiát, Katerina Zmolíková, Lukás Burget, Honza Cernocký, 
Jointly Trained Transformers Models for Spoken Language Translation.
ICASSP2021
Bolaji Yusuf, Lucas Ondel, Lukás Burget, Jan Cernocký, Murat Saraçlar, 
A Hierarchical Subspace Model for Language-Attuned Acoustic Unit Discovery.
Interspeech2021
Karel Benes, Lukás Burget, 
Text Augmentation for Language Models in High Error Recognition Scenario.
Interspeech2021
Ekaterina Egorova, Hari Krishna Vydana, Lukás Burget, Jan Cernocký, 
Out-of-Vocabulary Words Detection with Attention and CTC Alignments in an End-to-End ASR System.
Interspeech2021
Junyi Peng, Xiaoyang Qu, Rongzhi Gu, Jianzong Wang, Jing Xiao 0006, Lukás Burget, Jan Cernocký, 
Effective Phase Encoding for End-To-End Speaker Verification.
Interspeech2021
Junyi Peng, Xiaoyang Qu, Jianzong Wang, Rongzhi Gu, Jing Xiao 0006, Lukás Burget, Jan Cernocký, 
ICSpk: Interpretable Complex Speaker Embedding Extractor from Raw Waveform.
Interspeech2021
Themos Stafylakis, Johan Rohdin, Lukás Burget, 
Speaker Embeddings by Modeling Channel-Wise Correlations.
TASLP2020
Mireia Díez, Lukás Burget, Federico Landini, Jan Cernocký, 
Analysis of Speaker Diarization Based on Bayesian HMM With Eigenvoice Priors.
TASLP2020
Santosh Kesiraju, Oldrich Plchot, Lukás Burget, Suryakanth V. Gangashetty, 
Learning Document Embeddings Along With Their Uncertainties.
Interspeech2022
Jaejin Cho, Raghavendra Pappagari, Piotr Zelasko, Laureano Moro-Velázquez, Jesús Villalba 0001, Najim Dehak, 
Non-contrastive self-supervised learning of utterance-level speech representations.
Interspeech2022
Sonal Joshi, Saurabh Kataria, Yiwen Shao, Piotr Zelasko, Jesús Villalba 0001, Sanjeev Khudanpur, Najim Dehak, 
Defense against Adversarial Attacks on Hybrid Speech Recognition System using Adversarial Fine-tuning with Denoiser.
Interspeech2022
Sonal Joshi, Saurabh Kataria, Jesús Villalba 0001, Najim Dehak, 
AdvEst: Adversarial Perturbation Estimation to Classify and Detect Adversarial Attacks against Speaker Identification.
Interspeech2022
Saurabh Kataria, Jesús Villalba 0001, Laureano Moro-Velázquez, Najim Dehak, 
Joint domain adaptation and speech bandwidth extension using time-domain GANs for speaker verification.
Interspeech2022
Magdalena Rybicka, Jesús Villalba 0001, Najim Dehak, Konrad Kowalczyk, 
End-to-End Neural Speaker Diarization with an Iterative Refinement of Non-Autoregressive Attention-based Attractors.
Interspeech2022
Yiwen Shao, Jesús Villalba 0001, Sonal Joshi, Saurabh Kataria, Sanjeev Khudanpur, Najim Dehak, 
Chunking Defense for Adversarial Attacks on ASR.
ICASSP2021
Nanxin Chen, Piotr Zelasko, Jesús Villalba 0001, Najim Dehak, 
Focus on the Present: A Regularization Method for the ASR Source-Target Attention Layer.
ICASSP2021
Jaejin Cho, Piotr Zelasko, Jesús Villalba 0001, Najim Dehak, 
Improving Reconstruction Loss Based Speaker Embedding in Unsupervised and Semi-Supervised Scenarios.
ICASSP2021
Siyuan Feng 0001, Piotr Zelasko, Laureano Moro-Velázquez, Ali Abavisani, Mark Hasegawa-Johnson, Odette Scharenborg, Najim Dehak, 
How Phonotactics Affect Multilingual and Zero-Shot ASR Performance.
ICASSP2021
Saurabh Kataria, Jesús Villalba 0001, Najim Dehak, 
Perceptual Loss Based Speech Denoising with an Ensemble of Audio Pattern Recognition and Self-Supervised Models.
ICASSP2021
Raghavendra Pappagari, Jesús Villalba 0001, Piotr Zelasko, Laureano Moro-Velázquez, Najim Dehak, 
CopyPaste: An Augmentation Method for Speech Emotion Recognition.
Interspeech2021
Saurabhchand Bhati, Jesús Villalba 0001, Piotr Zelasko, Laureano Moro-Velázquez, Najim Dehak, 
Segmental Contrastive Predictive Coding for Unsupervised Word Segmentation.
Interspeech2021
Nanxin Chen, Piotr Zelasko, Laureano Moro-Velázquez, Jesús Villalba 0001, Najim Dehak, 
Align-Denoise: Single-Pass Non-Autoregressive Speech Recognition.
Interspeech2021
Nanxin Chen, Yu Zhang 0033, Heiga Zen, Ron J. Weiss, Mohammad Norouzi 0002, Najim Dehak, William Chan, 
WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis.
Interspeech2021
Saurabh Kataria, Jesús Villalba 0001, Piotr Zelasko, Laureano Moro-Velázquez, Najim Dehak, 
Deep Feature CycleGANs: Speaker Identity Preserving Non-Parallel Microphone-Telephone Domain Adaptation for Speaker Verification.
Interspeech2021
Raghavendra Pappagari, Jaejin Cho, Sonal Joshi, Laureano Moro-Velázquez, Piotr Zelasko, Jesús Villalba 0001, Najim Dehak, 
Automatic Detection and Assessment of Alzheimer Disease Using Speech and Language Technologies in Low-Resource Scenarios.
Interspeech2021
Magdalena Rybicka, Jesús Villalba 0001, Piotr Zelasko, Najim Dehak, Konrad Kowalczyk, 
Spine2Net: SpineNet with Res2Net and Time-Squeeze-and-Excitation Blocks for Speaker Recognition.
Interspeech2021
Jesús Villalba 0001, Sonal Joshi, Piotr Zelasko, Najim Dehak, 
Representation Learning to Classify and Detect Adversarial Attacks Against Speaker and Speech Recognition Systems.
TASLP2020
Laureano Moro-Velázquez, Estefanía Hernández-García, Jorge Andrés Gómez García, Juan Ignacio Godino-Llorente, Najim Dehak, 
Analysis of the Effects of Supraglottal Tract Surgical Procedures in Automatic Speaker Recognition Performance.
ICASSP2020
Saurabh Kataria, Phani Sankar Nidadavolu, Jesús Villalba 0001, Nanxin Chen, L. Paola García-Perera, Najim Dehak, 
Feature Enhancement with Deep Feature Losses for Speaker Verification.
ICASSP2022
Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Shinji Watanabe 0001, 
Integrating Multiple ASR Systems into NLP Backend with Attention Fusion.
ICASSP2022
Keisuke Kinoshita, Marc Delcroix, Tomoharu Iwata, 
Tight Integration Of Neural- And Clustering-Based Diarization Through Deep Unfolding Of Infinite Gaussian Mixture Model.
ICASSP2022
Takafumi Moriya, Takanori Ashihara, Atsushi Ando, Hiroshi Sato, Tomohiro Tanaka, Kohei Matsuura, Ryo Masumura, Marc Delcroix, Takahiro Shinozaki, 
Hybrid RNN-T/Attention-Based Streaming ASR with Triggered Chunkwise Attention and Dual Internal Language Model Integration.
ICASSP2022
Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Naoyuki Kamo, Takafumi Moriya, 
Learning to Enhance or Not: Neural Network-Based Switching of Enhanced and Observed Signals for Overlapping Speech Recognition.
Interspeech2022
Marc Delcroix, Keisuke Kinoshita, Tsubasa Ochiai, Katerina Zmolíková, Hiroshi Sato, Tomohiro Nakatani, 
Listen only to me! How well can target speech extraction handle false alarms?
Interspeech2022
Kazuma Iwamoto, Tsubasa Ochiai, Marc Delcroix, Rintaro Ikeshita, Hiroshi Sato, Shoko Araki, Shigeru Katagiri, 
How bad are artifacts?: Analyzing the impact of speech enhancement errors on ASR.
Interspeech2022
Keisuke Kinoshita, Thilo von Neumann, Marc Delcroix, Christoph Böddeker, Reinhold Haeb-Umbach, 
Utterance-by-utterance overlap-aware neural diarization with Graph-PIT.
Interspeech2022
Martin Kocour, Katerina Zmolíková, Lucas Ondel, Jan Svec, Marc Delcroix, Tsubasa Ochiai, Lukás Burget, Jan Cernocký, 
Revisiting joint decoding based multi-talker speech recognition with DNN acoustic model.
Interspeech2022
Takafumi Moriya, Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Takahiro Shinozaki, 
Streaming Target-Speaker ASR with Neural Transducer.
Interspeech2022
Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Takafumi Moriya, Naoki Makishima, Mana Ihori, Tomohiro Tanaka, Ryo Masumura, 
Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations.
ICASSP2021
Christoph Böddeker, Wangyou Zhang, Tomohiro Nakatani, Keisuke Kinoshita, Tsubasa Ochiai, Marc Delcroix, Naoyuki Kamo, Yanmin Qian, Reinhold Haeb-Umbach, 
Convolutive Transfer Function Invariant SDR Training Criteria for Multi-Channel Reverberant Speech Separation.
ICASSP2021
Marc Delcroix, Katerina Zmolíková, Tsubasa Ochiai, Keisuke Kinoshita, Tomohiro Nakatani, 
Speaker Activity Driven Neural Speech Extraction.
ICASSP2021
Chenda Li, Zhuo Chen 0006, Yi Luo 0004, Cong Han, Tianyan Zhou, Keisuke Kinoshita, Marc Delcroix, Shinji Watanabe 0001, Yanmin Qian, 
Dual-Path Modeling for Long Recording Speech Separation in Meetings.
ICASSP2021
Atsunori Ogawa, Naohiro Tawara, Takatomo Kano, Marc Delcroix, 
BLSTM-Based Confidence Estimation for End-to-End Speech Recognition.
ICASSP2021
Julio Wissing, Benedikt T. Boenninghoff, Dorothea Kolossa, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Shoko Araki, Christopher Schymura, 
Data Fusion for Audiovisual Speaker Localization: Extending Dynamic Stream Weights to the Spatial Domain.
Interspeech2021
Marc Delcroix, Jorge Bennasar Vázquez, Tsubasa Ochiai, Keisuke Kinoshita, Shoko Araki, 
Few-Shot Learning of New Sound Classes for Target Sound Extraction.
Interspeech2021
Cong Han, Yi Luo 0004, Chenda Li, Tianyan Zhou, Keisuke Kinoshita, Shinji Watanabe 0001, Marc Delcroix, Hakan Erdogan, John R. Hershey, Nima Mesgarani, Zhuo Chen 0006, 
Continuous Speech Separation Using Speaker Inventory for Long Recording.
Interspeech2021
Keisuke Kinoshita, Marc Delcroix, Naohiro Tawara, 
Advances in Integration of End-to-End Neural and Clustering-Based Diarization for Real Conversational Speech.
Interspeech2021
Takafumi Moriya, Tomohiro Tanaka, Takanori Ashihara, Tsubasa Ochiai, Hiroshi Sato, Atsushi Ando, Ryo Masumura, Marc Delcroix, Taichi Asami, 
Streaming End-to-End Speech Recognition for Hybrid RNN-T/Attention Architecture.
Interspeech2021
Thilo von Neumann, Keisuke Kinoshita, Christoph Böddeker, Marc Delcroix, Reinhold Haeb-Umbach, 
Graph-PIT: Generalized Permutation Invariant Training for Continuous Separation of Arbitrary Numbers of Speakers.
ICASSP2022
Zili Huang, Shinji Watanabe 0001, Shu-Wen Yang, Paola García, Sanjeev Khudanpur, 
Investigating Self-Supervised Learning for Speech Enhancement and Separation.
Interspeech2022
Sonal Joshi, Saurabh Kataria, Yiwen Shao, Piotr Zelasko, Jesús Villalba 0001, Sanjeev Khudanpur, Najim Dehak, 
Defense against Adversarial Attacks on Hybrid Speech Recognition System using Adversarial Fine-tuning with Denoiser.
Interspeech2022
Hexin Liu, Leibny Paola García-Perera, Andy W. H. Khong, Suzy J. Styles, Sanjeev Khudanpur, 
PHO-LID: A Unified Model Incorporating Acoustic-Phonetic and Phonotactic Information for Language Identification.
Interspeech2022
Yiwen Shao, Jesús Villalba 0001, Sonal Joshi, Saurabh Kataria, Sanjeev Khudanpur, Najim Dehak, 
Chunking Defense for Adversarial Attacks on ASR.
ICASSP2021
Hang Lv 0001, Zhehuai Chen, Hainan Xu, Daniel Povey, Lei Xie 0001, Sanjeev Khudanpur, 
An Asynchronous WFST-Based Decoder for Automatic Speech Recognition.
ICASSP2021
Ke Li 0018, Daniel Povey, Sanjeev Khudanpur, 
A Parallelizable Lattice Rescoring Strategy with Neural Language Models.
ICASSP2021
Matthew Maciejewski, Jing Shi 0003, Shinji Watanabe 0001, Sanjeev Khudanpur, 
Training Noisy Single-Channel Speech Separation with Noisy Oracle Sources: A Large Gap and a Small Step.
ICASSP2021
Yiming Wang, Hang Lv 0001, Daniel Povey, Lei Xie 0001, Sanjeev Khudanpur, 
Wake Word Detection with Streaming Transformers.
Interspeech2021
Guoguo Chen, Shuzhou Chai, Guan-Bo Wang, Jiayu Du, Wei-Qiang Zhang, Chao Weng, Dan Su 0002, Daniel Povey, Jan Trmal, Junbo Zhang, Mingjie Jin, Sanjeev Khudanpur, Shinji Watanabe 0001, Shuaijiang Zhao, Wei Zou, Xiangang Li, Xuchen Yao, Yongqing Wang, Zhao You, Zhiyong Yan, 
GigaSpeech: An Evolving, Multi-Domain ASR Corpus with 10, 000 Hours of Transcribed Audio.
Interspeech2021
Hexin Liu, Leibny Paola García-Perera, Xinyi Zhang, Justin Dauwels, Andy W. H. Khong, Sanjeev Khudanpur, Suzy J. Styles, 
End-to-End Language Diarization for Bilingual Code-Switching Speech.
Interspeech2021
Matthew Maciejewski, Shinji Watanabe 0001, Sanjeev Khudanpur, 
Speaker Verification-Based Evaluation of Single-Channel Speech Separation.
Interspeech2021
Desh Raj, Sanjeev Khudanpur, 
Reformulating DOVER-Lap Label Mapping as a Graph Partitioning Problem.
Interspeech2021
Matthew Wiesner, Mousmita Sarma, Ashish Arora, Desh Raj, Dongji Gao, Ruizhe Huang, Supreet Preet, Moris Johnson, Zikra Iqbal, Nagendra Goel, Jan Trmal, Leibny Paola García-Perera, Sanjeev Khudanpur, 
Training Hybrid Models on Noisy Transliterated Transcripts for Code-Switched Speech Recognition.
ICASSP2020
Ke Li 0018, Zhe Liu 0011, Tianxing He, Hongzhao Huang, Fuchun Peng, Daniel Povey, Sanjeev Khudanpur, 
An Empirical Study of Transformer-Based Neural Language Model Adaptation.
Interspeech2020
Pegah Ghahramani, Hossein Hadian, Daniel Povey, Hynek Hermansky, Sanjeev Khudanpur, 
An Alternative to MFCCs for ASR.
Interspeech2020
Ruizhe Huang, Ke Li 0018, Ashish Arora, Daniel Povey, Sanjeev Khudanpur, 
Efficient MDI Adaptation for n-Gram Language Models.
Interspeech2020
Ke Li 0018, Daniel Povey, Sanjeev Khudanpur, 
Neural Language Modeling with Implicit Cache Pointers.
Interspeech2020
Yiwen Shao, Yiming Wang, Daniel Povey, Sanjeev Khudanpur, 
PyChain: A Fully Parallelized PyTorch Implementation of LF-MMI for End-to-End ASR.
Interspeech2020
Yiming Wang, Hang Lv 0001, Daniel Povey, Lei Xie 0001, Sanjeev Khudanpur, 
Wake Word Detection with Alignment-Free Lattice-Free MMI.
ICASSP2019
Vimal Manohar, Szu-Jui Chen, Zhiqi Wang, Yusuke Fujita, Shinji Watanabe 0001, Sanjeev Khudanpur, 
Acoustic Modeling for Overlapping Speech Recognition: Jhu Chime-5 Challenge System.
SpeechComm2022
Bo Chen, Zhihang Xu, Kai Yu 0004, 
Data augmentation based non-parallel voice conversion with frame-level speaker disentangler.
TASLP2022
Bo Chen, Chenpeng Du, Kai Yu 0004, 
Neural Fusion for Voice Cloning.
TASLP2022
Chenpeng Du, Kai Yu 0004, 
Phone-Level Prosody Modelling With GMM-Based MDN for Diverse and Controllable Speech Synthesis.
ICASSP2022
Lingfeng Dai, Lu Chen 0002, Zhikai Zhou, Kai Yu 0004, 
LatticeBART: Lattice-to-Lattice Pre-Training for Speech Recognition.
ICASSP2022
Yiwei Guo, Chenpeng Du, Kai Yu 0004, 
Unsupervised Word-Level Prosody Tagging for Controllable Speech Synthesis.
ICASSP2022
Guangwei Li, Xuenan Xu, Heinrich Dinkel, Mengyue Wu, Kai Yu 0004, 
Category-Adapted Sound Event Enhancement with Weakly Labeled Data.
ICASSP2022
Siyu Lou, Xuenan Xu, Mengyue Wu, Kai Yu 0004, 
Audio-Text Retrieval in Context.
ICASSP2022
Yu Xi, Tian Tan 0002, Wangyou Zhang, Baochen Yang, Kai Yu 0004, 
Text Adaptive Detection for Customizable Keyword Spotting.
ICASSP2022
Xuenan Xu, Mengyue Wu, Kai Yu 0004, 
Diversity-Controllable and Accurate Audio Captioning Based on Neural Condition.
Interspeech2022
Chenpeng Du, Yiwei Guo, Xie Chen 0001, Kai Yu 0004, 
VQTTS: High-Fidelity Text-to-Speech Synthesis with Self-Supervised VQ Acoustic Feature.
Interspeech2022
Tao Liu, Shuai Fan 0005, Xu Xiang, Hongbo Song, Shaoxiong Lin, Jiaqi Sun, Tianyuan Han, Siyuan Chen, Binwei Yao, Sen Liu, Yifei Wu, Yanmin Qian, Kai Yu 0004, 
MSDWild: Multi-modal Speaker Diarization Dataset in the Wild.
TASLP2021
Heinrich Dinkel, Shuai Wang 0016, Xuenan Xu, Mengyue Wu, Kai Yu 0004, 
Voice Activity Detection in the Wild: A Data-Driven Approach Using Teacher-Student Training.
ICASSP2021
Chenpeng Du, Bing Han, Shuai Wang 0016, Yanmin Qian, Kai Yu 0004, 
SynAug: Synthesis-Based Data Augmentation for Text-Dependent Speaker Verification.
ICASSP2021
Xuenan Xu, Heinrich Dinkel, Mengyue Wu, Kai Yu 0004, 
Text-to-Audio Grounding: Building Correspondence Between Captions and Sound Events.
Interspeech2021
Lingfeng Dai, Qi Liu, Kai Yu 0004, 
Class-Based Neural Network Language Model for Second-Pass Rescoring in ASR.
Interspeech2021
Chenpeng Du, Kai Yu 0004, 
Rich Prosody Diversity Modelling with Phone-Level Mixture Density Network.
Interspeech2021
Xuenan Xu, Heinrich Dinkel, Mengyue Wu, Kai Yu 0004, 
A Lightweight Framework for Online Voice Activity Detection in the Wild.
TASLP2020
Shuai Wang 0016, Yexin Yang, Zhanghao Wu, Yanmin Qian, Kai Yu 0004, 
Data Augmentation Using Deep Generative Models for Embedding Based Speaker Recognition.
TASLP2020
Kai Yu 0004, Rao Ma, Kaiyu Shi, Qi Liu 0018, 
Neural Network Language Model Compression With Product Quantization and Soft Binarization.
TASLP2020
Su Zhu, Ruisheng Cao, Kai Yu 0004, 
Dual Learning for Semi-Supervised Natural Language Understanding.
ICASSP2022
Hang Chen, Hengshun Zhou, Jun Du, Chin-Hui Lee, Jingdong Chen, Shinji Watanabe 0001, Sabato Marco Siniscalchi, Odette Scharenborg, Diyuan Liu, Bao-Cai Yin, Jia Pan, Jianqing Gao, Cong Liu 0006, 
The First Multimodal Information Based Speech Processing (Misp) Challenge: Data, Tasks, Baselines And Results.
ICASSP2022
Maokui He, Xiang Lv, Weilin Zhou, Jingjing Yin, Xiaoqi Zhang, Yuxuan Wang, Shutong Niu, Yuhang Cao, Heng Lu, Jun Du, Chin-Hui Lee, 
The USTC-Ximalaya System for the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription (M2met) Challenge.
Interspeech2022
Hang Chen, Jun Du, Yusheng Dai, Chin-Hui Lee, Sabato Marco Siniscalchi, Shinji Watanabe 0001, Odette Scharenborg, Jingdong Chen, Baocai Yin, Jia Pan, 
Audio-Visual Speech Recognition in MISP2021 Challenge: Dataset Release and Deep Analysis.
Interspeech2022
Mao-Kui He, Jun Du, Chin-Hui Lee, 
End-to-End Audio-Visual Neural Speaker Diarization.
Interspeech2022
Yajian Wang, Jun Du, Hang Chen, Qing Wang 0008, Chin-Hui Lee, 
Deep Segment Model for Acoustic Scene Classification.
Interspeech2022
Hengshun Zhou, Jun Du, Gongzhen Zou, Zhaoxu Nian, Chin-Hui Lee, Sabato Marco Siniscalchi, Shinji Watanabe 0001, Odette Scharenborg, Jingdong Chen, Shifu Xiong, Jianqing Gao, 
Audio-Visual Wake Word Spotting in MISP2021 Challenge: Dataset Release and Deep Analysis.
TASLP2021
Li Chai 0002, Jun Du, Qing-Feng Liu, Chin-Hui Lee, 
A Cross-Entropy-Guided Measure (CEGM) for Assessing Speech Recognition Performance and Optimizing DNN-Based Speech Enhancement.
ICASSP2021
Zhaoxu Nian, Yan-Hui Tu, Jun Du, Chin-Hui Lee, 
A Progressive Learning Approach to Adaptive Noise and Speech Estimation for Speech Enhancement and Noisy Speech Recognition.
ICASSP2021
Chao-Han Huck Yang, Jun Qi 0002, Samuel Yen-Chi Chen, Pin-Yu Chen, Sabato Marco Siniscalchi, Xiaoli Ma, Chin-Hui Lee, 
Decentralizing Feature Extraction with Quantum Convolutional Neural Network for Automatic Speech Recognition.
Interspeech2021
Hang Chen, Jun Du, Yu Hu 0003, Li-Rong Dai 0001, Bao-Cai Yin, Chin-Hui Lee, 
Automatic Lip-Reading with Hierarchical Pyramidal Convolution and Self-Attention for Image Sequences with No Word Boundaries.
Interspeech2021
Yu-Xuan Wang, Jun Du, Maokui He, Shutong Niu, Lei Sun, Chin-Hui Lee, 
Scenario-Dependent Speaker Diarization for DIHARD-III Challenge.
Interspeech2021
Chao-Han Huck Yang, Sabato Marco Siniscalchi, Chin-Hui Lee, 
PATE-AAE: Incorporating Adversarial Autoencoder into Private Aggregation of Teacher Ensembles for Spoken Command Classification.
Interspeech2021
Xiaoqi Zhang, Jun Du, Li Chai 0002, Chin-Hui Lee, 
A Maximum Likelihood Approach to SNR-Progressive Learning Using Generalized Gaussian Distribution for LSTM-Based Speech Enhancement.
Interspeech2021
Hengshun Zhou, Jun Du, Hang Chen, Zijun Jing, Shifu Xiong, Chin-Hui Lee, 
Audio-Visual Information Fusion Using Cross-Modal Teacher-Student Learning for Voice Activity Detection in Realistic Environments.
TASLP2020
Yanhui Tu, Jun Du, Tian Gao, Chin-Hui Lee, 
A Multi-Target SNR-Progressive Learning Approach to Regression Based Speech Enhancement.
ICASSP2020
Zhong Meng, Hu Hu, Jinyu Li 0001, Changliang Liu, Yan Huang 0028, Yifan Gong 0001, Chin-Hui Lee, 
L-Vector: Neural Label Embedding for Domain Adaptation.
ICASSP2020
Shutong Niu, Jun Du, Li Chai 0002, Chin-Hui Lee, 
A Maximum Likelihood Approach to Multi-Objective Learning Using Generalized Gaussian Distributions for Dnn-Based Speech Enhancement.
ICASSP2020
Jun Qi 0002, Hu Hu, Yannan Wang, Chao-Han Huck Yang, Sabato Marco Siniscalchi, Chin-Hui Lee, 
Tensor-To-Vector Regression for Multi-Channel Speech Enhancement Based on Tensor-Train Network.
ICASSP2020
Yanhui Tu, Jun Du, Chin-Hui Lee, 
2D-to-2D Mask Estimation for Speech Enhancement Based on Fully Convolutional Neural Network.
ICASSP2020
Xin Wang 0037, Jun Du, Alejandrina Cristià, Lei Sun, Chin-Hui Lee, 
A Study of Child Speech Extraction Using Joint Speech Enhancement and Separation in Realistic Conditions.
SpeechComm2022
Lili Guo, Longbiao Wang, Jianwu Dang 0001, Eng Siong Chng, Seiichi Nakagawa, 
Learning affective representations based on magnitude and dynamic relative phase information for speech emotion recognition.
SpeechComm2022
Wenhuan Lu, Xinyue Zhao, Na Guo, Yongwei Li, Jianguo Wei, Jianhua Tao, Jianwu Dang 0001, 
One-shot emotional voice conversion based on feature separation.
ICASSP2022
Yuan Gao, Shogo Okada, Longbiao Wang, Jiaxing Liu, Jianwu Dang 0001, 
Domain-Invariant Feature Learning for Cross Corpus Speech Emotion Recognition.
ICASSP2022
Meng Ge, Chenglin Xu, Longbiao Wang, Eng Siong Chng, Jianwu Dang 0001, Haizhou Li 0001, 
L-SpEx: Localized Target Speaker Extraction.
ICASSP2022
Cheng Gong, Longbiao Wang, Zhenhua Ling, Ju Zhang, Jianwu Dang 0001, 
Using Multiple Reference Audios and Style Embedding Constraints for Speech Synthesis.
ICASSP2022
Yongjie Lv, Longbiao Wang, Meng Ge, Sheng Li 0010, Chenchen Ding, Lixin Pan, Yuguang Wang, Jianwu Dang 0001, Kiyoshi Honda, 
Compressing Transformer-Based ASR Model by Task-Driven Loss and Attention-Based Multi-Level Feature Distillation.
ICASSP2022
Yaodong Song, Jiaxing Liu, Longbiao Wang, Ruiguo Yu, Jianwu Dang 0001, 
Multi-Stage Graph Representation Learning for Dialogue-Level Speech Emotion Recognition.
ICASSP2022
Hanyi Zhang, Longbiao Wang, Kong Aik Lee, Meng Liu, Jianwu Dang 0001, Hui Chen, 
Learning Domain-Invariant Transformation for Speaker Verification.
ICASSP2022
Xiangyu Zhao, Longbiao Wang, Jianwu Dang 0001, 
Improving Dialogue Generation via Proactively Querying Grounded Knowledge.
Interspeech2022
Yanjie Fu, Meng Ge, Haoran Yin, Xinyuan Qian, Longbiao Wang, Gaoyan Zhang, Jianwu Dang 0001, 
Iterative Sound Source Localization for Unknown Number of Sources.
Interspeech2022
Jiaxu He, Cheng Gong, Longbiao Wang, Di Jin 0001, Xiaobao Wang, Junhai Xu, Jianwu Dang 0001, 
Improve emotional speech synthesis quality by learning explicit and implicit representations with semi-supervised training.
Interspeech2022
Kai Li 0018, Sheng Li 0010, Xugang Lu, Masato Akagi, Meng Liu, Lin Zhang, Chang Zeng, Longbiao Wang, Jianwu Dang 0001, Masashi Unoki, 
Data Augmentation Using McAdams-Coefficient-Based Speaker Anonymization for Fake Audio Detection.
Interspeech2022
Junjie Li, Meng Ge, Zexu Pan, Longbiao Wang, Jianwu Dang 0001, 
VCSE: Time-Domain Visual-Contextual Speaker Extraction Network.
Interspeech2022
Nan Li, Meng Ge, Longbiao Wang, Masashi Unoki, Sheng Li 0010, Jianwu Dang 0001, 
Global Signal-to-noise Ratio Estimation Based on Multi-subband Processing Using Convolutional Neural Network.
Interspeech2022
Siqing Qin, Longbiao Wang, Sheng Li 0010, Yuqin Lin, Jianwu Dang 0001, 
Finer-grained Modeling units-based Meta-Learning for Low-resource Tibetan Speech Recognition.
Interspeech2022
Hao Shi, Longbiao Wang, Sheng Li 0010, Jianwu Dang 0001, Tatsuya Kawahara, 
Monaural Speech Enhancement Based on Spectrogram Decomposition for Convolutional Neural Network-sensitive Feature Extraction.
Interspeech2022
Tongtong Song, Qiang Xu, Meng Ge, Longbiao Wang, Hao Shi, Yongjie Lv, Yuqin Lin, Jianwu Dang 0001, 
Language-specific Characteristic Assistance for Code-switching Speech Recognition.
Interspeech2022
Shiquan Wang, Yuke Si, Xiao Wei, Longbiao Wang, Zhiqiang Zhuang, Xiaowang Zhang, Jianwu Dang 0001, 
TopicKS: Topic-driven Knowledge Selection for Knowledge-grounded Dialogue Generation.
Interspeech2022
Xiao Wei, Yuke Si, Shiquan Wang, Longbiao Wang, Jianwu Dang 0001, 
Hierarchical Tagger with Multi-task Learning for Cross-domain Slot Filling.
Interspeech2022
Qiang Xu, Tongtong Song, Longbiao Wang, Hao Shi, Yuqin Lin, Yongjie Lv, Meng Ge, Qiang Yu 0005, Jianwu Dang 0001, 
Self-Distillation Based on High-level Information Supervision for Compressing End-to-End ASR Model.
ICASSP2022
Hang Chen, Hengshun Zhou, Jun Du, Chin-Hui Lee, Jingdong Chen, Shinji Watanabe 0001, Sabato Marco Siniscalchi, Odette Scharenborg, Diyuan Liu, Bao-Cai Yin, Jia Pan, Jianqing Gao, Cong Liu 0006, 
The First Multimodal Information Based Speech Processing (Misp) Challenge: Data, Tasks, Baselines And Results.
ICASSP2022
Maokui He, Xiang Lv, Weilin Zhou, Jingjing Yin, Xiaoqi Zhang, Yuxuan Wang, Shutong Niu, Yuhang Cao, Heng Lu, Jun Du, Chin-Hui Lee, 
The USTC-Ximalaya System for the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription (M2met) Challenge.
ICASSP2022
Zhaoxu Nian, Jun Du, Yu Ting Yeung, Renyu Wang, 
A Time Domain Progressive Learning Approach with SNR Constriction for Single-Channel Speech Enhancement and Recognition.
Interspeech2022
Hang Chen, Jun Du, Yusheng Dai, Chin-Hui Lee, Sabato Marco Siniscalchi, Shinji Watanabe 0001, Odette Scharenborg, Jingdong Chen, Baocai Yin, Jia Pan, 
Audio-Visual Speech Recognition in MISP2021 Challenge: Dataset Release and Deep Analysis.
Interspeech2022
Mao-Kui He, Jun Du, Chin-Hui Lee, 
End-to-End Audio-Visual Neural Speaker Diarization.
Interspeech2022
Yajian Wang, Jun Du, Hang Chen, Qing Wang 0008, Chin-Hui Lee, 
Deep Segment Model for Acoustic Scene Classification.
Interspeech2022
Yanyan Yue, Jun Du, Mao-Kui He, Yu Ting Yeung, Renyu Wang, 
Online Speaker Diarization with Core Samples Selection.
Interspeech2022
Guolong Zhong, Hongyu Song, Ruoyu Wang 0029, Lei Sun, Diyuan Liu, Jia Pan, Xin Fang, Jun Du, Jie Zhang, Lirong Dai, 
External Text Based Data Augmentation for Low-Resource Speech Recognition in the Constrained Condition of OpenASR21 Challenge.
Interspeech2022
Hengshun Zhou, Jun Du, Gongzhen Zou, Zhaoxu Nian, Chin-Hui Lee, Sabato Marco Siniscalchi, Shinji Watanabe 0001, Odette Scharenborg, Jingdong Chen, Shifu Xiong, Jianqing Gao, 
Audio-Visual Wake Word Spotting in MISP2021 Challenge: Dataset Release and Deep Analysis.
TASLP2021
Li Chai 0002, Jun Du, Qing-Feng Liu, Chin-Hui Lee, 
A Cross-Entropy-Guided Measure (CEGM) for Assessing Speech Recognition Performance and Optimizing DNN-Based Speech Enhancement.
ICASSP2021
Zhaoxu Nian, Yan-Hui Tu, Jun Du, Chin-Hui Lee, 
A Progressive Learning Approach to Adaptive Noise and Speech Estimation for Speech Enhancement and Noisy Speech Recognition.
Interspeech2021
Hang Chen, Jun Du, Yu Hu 0003, Li-Rong Dai 0001, Bao-Cai Yin, Chin-Hui Lee, 
Automatic Lip-Reading with Hierarchical Pyramidal Convolution and Self-Attention for Image Sequences with No Word Boundaries.
Interspeech2021
Yihui Fu, Luyao Cheng, Shubo Lv, Yukai Jv, Yuxiang Kong, Zhuo Chen 0006, Yanxin Hu, Lei Xie 0001, Jian Wu 0027, Hui Bu, Xin Xu, Jun Du, Jingdong Chen, 
AISHELL-4: An Open Source Dataset for Speech Enhancement, Separation, Recognition and Speaker Diarization in Conference Scenario.
Interspeech2021
Maokui He, Desh Raj, Zili Huang, Jun Du, Zhuo Chen 0006, Shinji Watanabe 0001, 
Target-Speaker Voice Activity Detection with Improved i-Vector Estimation for Unknown Number of Speaker.
Interspeech2021
Koen Oostermeijer, Qing Wang 0008, Jun Du, 
Lightweight Causal Transformer with Local Self-Attention for Real-Time Speech Enhancement.
Interspeech2021
Neville Ryant, Prachi Singh, Venkat Krishnamohan, Rajat Varma, Kenneth Church 0001, Christopher Cieri, Jun Du, Sriram Ganapathy, Mark Liberman, 
The Third DIHARD Diarization Challenge.
Interspeech2021
Yu-Xuan Wang, Jun Du, Maokui He, Shutong Niu, Lei Sun, Chin-Hui Lee, 
Scenario-Dependent Speaker Diarization for DIHARD-III Challenge.
Interspeech2021
Xiaoqi Zhang, Jun Du, Li Chai 0002, Chin-Hui Lee, 
A Maximum Likelihood Approach to SNR-Progressive Learning Using Generalized Gaussian Distribution for LSTM-Based Speech Enhancement.
Interspeech2021
Hengshun Zhou, Jun Du, Hang Chen, Zijun Jing, Shifu Xiong, Chin-Hui Lee, 
Audio-Visual Information Fusion Using Cross-Modal Teacher-Student Learning for Voice Activity Detection in Realistic Environments.
TASLP2020
Jia Pan, Genshun Wan, Jun Du, Zhongfu Ye, 
Online Speaker Adaptation Using Memory-Aware Networks for Speech Recognition.
SpeechComm2022
Hemant Kumar Kathania, Sudarsana Reddy Kadiri, Paavo Alku, Mikko Kurimo, 
A formant modification method for improved ASR of children's speech.
SpeechComm2022
Mittapalle Kiran Reddy, Hilla Pohjalainen, Pyry Helkkula, Kasimir Kaitue, Mikko Minkkinen, Heli Tolppanen, Tuomo Nieminen, Paavo Alku, 
Glottal flow characteristics in vowels produced by speakers with heart failure.
Interspeech2022
Farhad Javanmardi, Sudarsana Reddy Kadiri, Manila Kodali, Paavo Alku, 
Comparing 1-dimensional and 2-dimensional spectral feature representations in voice pathology detection using machine learning and deep learning classifiers.
Interspeech2022
Sudarsana Reddy Kadiri, Farhad Javanmardi, Paavo Alku, 
Convolutional Neural Networks for Classification of Voice Qualities from Speech and Neck Surface Accelerometer Signals.
SpeechComm2021
Krishna Gurugubelli, Anil Kumar Vuppala, N. P. Narendra, Paavo Alku, 
Duration of the rhotic approximant /ɹ/ in spastic dysarthria of different severity levels.
TASLP2021
N. P. Narendra, Björn W. Schuller, Paavo Alku, 
The Detection of Parkinson's Disease From Speech Using Voice Source Information.
SpeechComm2020
Sudarsana Reddy Kadiri, Paavo Alku, B. Yegnanarayana 0001, 
Analysis and classification of phonation types in speech and singing voice.
SpeechComm2020
N. P. Narendra, Paavo Alku, 
Automatic intelligibility assessment of dysarthric speech using glottal parameters.
TASLP2020
Dhananjaya N. Gowda, Sudarsana Reddy Kadiri, Brad H. Story, Paavo Alku, 
Time-Varying Quasi-Closed-Phase Analysis for Accurate Formant Tracking in Speech Signals.
ICASSP2020
Sudarsana Reddy Kadiri, Paavo Alku, B. Yegnanarayana 0001, 
Comparison of Glottal Closure Instants Detection Algorithms for Emotional Speech.
ICASSP2020
Hemant Kumar Kathania, Sudarsana Reddy Kadiri, Paavo Alku, Mikko Kurimo, 
Study of Formant Modification for Children ASR.
Interspeech2020
Sudarsana Reddy Kadiri, Rashmi Kethireddy, Paavo Alku, 
Parkinson's Disease Detection from Speech Using Single Frequency Filtering Cepstral Coefficients.
SpeechComm2019
N. P. Narendra, Manu Airaksinen, Brad H. Story, Paavo Alku, 
Estimation of the glottal source from coded telephone speech using deep neural networks.
SpeechComm2019
Paavo Alku, Tiina Murtola, Jarmo Malinen, Juha Kuortti, Brad H. Story, Manu Airaksinen, Mika Salmi, Erkki Vilkman, Ahmed Geneid, 
OPENGLOT - An open environment for the evaluation of glottal inverse filtering.
SpeechComm2019
Tiina Murtola, Jarmo Malinen, Ahmed Geneid, Paavo Alku, 
Analysis of phonation onsets in vowel production, using information from glottal area and flow estimate.
SpeechComm2019
Bajibabu Bollepalli, Lauri Juvela, Manu Airaksinen, Cassia Valentini-Botinhao, Paavo Alku, 
Normal-to-Lombard adaptation of speech synthesis using long short-term memory recurrent neural networks.
SpeechComm2019
N. P. Narendra, Paavo Alku, 
Dysarthric speech classification from coded telephone speech using glottal features.
TASLP2019
Lauri Juvela, Bajibabu Bollepalli, Vassilis Tsiaras, Paavo Alku, 
GlotNet - A Raw Waveform Model for the Glottal Excitation in Statistical Parametric Speech Synthesis.
ICASSP2019
Manu Airaksinen, Lauri Juvela, Paavo Alku, Okko Räsänen, 
Data Augmentation Strategies for Neural Network F0 Estimation.
ICASSP2019
Lauri Juvela, Bajibabu Bollepalli, Junichi Yamagishi, Paavo Alku, 
Waveform Generation for Text-to-speech Synthesis Using Pitch-synchronous Multi-scale Generative Adversarial Networks.
ICASSP2022
Hiroshi Sawada, Rintaro Ikeshita, Keisuke Kinoshita, Tomohiro Nakatani, 
Multi-Frame Full-Rank Spatial Covariance Analysis for Underdetermined BSS in Reverberant Environments.
Interspeech2022
Marc Delcroix, Keisuke Kinoshita, Tsubasa Ochiai, Katerina Zmolíková, Hiroshi Sato, Tomohiro Nakatani, 
Listen only to me! How well can target speech extraction handle false alarms?
TASLP2021
Nobutaka Ito, Rintaro Ikeshita, Hiroshi Sawada, Tomohiro Nakatani, 
A Joint Diagonalization Based Efficient Approach to Underdetermined Blind Audio Source Separation Using the Multichannel Wiener Filter.
ICASSP2021
Christoph Böddeker, Wangyou Zhang, Tomohiro Nakatani, Keisuke Kinoshita, Tsubasa Ochiai, Marc Delcroix, Naoyuki Kamo, Yanmin Qian, Reinhold Haeb-Umbach, 
Convolutive Transfer Function Invariant SDR Training Criteria for Multi-Channel Reverberant Speech Separation.
ICASSP2021
Marc Delcroix, Katerina Zmolíková, Tsubasa Ochiai, Keisuke Kinoshita, Tomohiro Nakatani, 
Speaker Activity Driven Neural Speech Extraction.
ICASSP2021
Julio Wissing, Benedikt T. Boenninghoff, Dorothea Kolossa, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Shoko Araki, Christopher Schymura, 
Data Fusion for Audiovisual Speaker Localization: Extending Dynamic Stream Weights to the Spatial Domain.
Interspeech2021
Christopher Schymura, Benedikt T. Bönninghoff, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Shoko Araki, Dorothea Kolossa, 
PILOT: Introducing Transformers for Probabilistic Sound Event Localization.
Interspeech2021
Ayako Yamamoto, Toshio Irino, Kenichi Arai, Shoko Araki, Atsunori Ogawa, Keisuke Kinoshita, Tomohiro Nakatani, 
Comparison of Remote Experiments Using Crowdsourcing and Laboratory Experiments on Speech Intelligibility.
SpeechComm2020
Katsuhiko Yamamoto, Toshio Irino, Shoko Araki, Keisuke Kinoshita, Tomohiro Nakatani, 
GEDI: Gammachirp envelope distortion index for predicting intelligibility of enhanced speech.
TASLP2020
Tomohiro Nakatani, Christoph Böddeker, Keisuke Kinoshita, Rintaro Ikeshita, Marc Delcroix, Reinhold Haeb-Umbach, 
Jointly Optimal Denoising, Dereverberation, and Source Separation.
ICASSP2020
Marc Delcroix, Tsubasa Ochiai, Katerina Zmolíková, Keisuke Kinoshita, Naohiro Tawara, Tomohiro Nakatani, Shoko Araki, 
Improving Speaker Discrimination of Target Speech Extraction With Time-Domain Speakerbeam.
ICASSP2020
Rintaro Ikeshita, Tomohiro Nakatani, Shoko Araki, 
Overdetermined Independent Vector Analysis.
ICASSP2020
Keisuke Kinoshita, Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani, 
Improving Noise Robust Automatic Speech Recognition with Single-Channel Time-Domain Enhancement Network.
ICASSP2020
Tatsuki Kondo, Kanta Fukushige, Norihiro Takamune, Daichi Kitamura, Hiroshi Saruwatari, Rintaro Ikeshita, Tomohiro Nakatani, 
Convergence-Guaranteed Independent Positive Semidefinite Tensor Analysis Based on Student's T Distribution.
ICASSP2020
Thilo von Neumann, Keisuke Kinoshita, Lukas Drude, Christoph Böddeker, Marc Delcroix, Tomohiro Nakatani, Reinhold Haeb-Umbach, 
End-to-End Training of Time Domain Audio Separation and Recognition.
ICASSP2020
Christopher Schymura, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Shoko Araki, Dorothea Kolossa, 
A Dynamic Stream Weight Backprop Kalman Filter for Audiovisual Speaker Tracking.
Interspeech2020
Kenichi Arai, Shoko Araki, Atsunori Ogawa, Keisuke Kinoshita, Tomohiro Nakatani, Toshio Irino, 
Predicting Intelligibility of Enhanced Speech Using Posteriors Derived from DNN-Based ASR System.
Interspeech2020
Keisuke Kinoshita, Thilo von Neumann, Marc Delcroix, Tomohiro Nakatani, Reinhold Haeb-Umbach, 
Multi-Path RNN for Hierarchical Modeling of Long Sequential Data and its Application to Speaker Stream Separation.
Interspeech2020
Tomohiro Nakatani, Rintaro Ikeshita, Keisuke Kinoshita, Hiroshi Sawada, Shoko Araki, 
Computationally Efficient and Versatile Framework for Joint Optimization of Blind Speech Separation and Dereverberation.
Interspeech2020
Thilo von Neumann, Christoph Böddeker, Lukas Drude, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani, Reinhold Haeb-Umbach, 
Multi-Talker ASR for an Unknown Number of Sources: Joint Training of Source Counting, Separation and ASR.
SpeechComm2022
Lili Guo, Longbiao Wang, Jianwu Dang 0001, Eng Siong Chng, Seiichi Nakagawa, 
Learning affective representations based on magnitude and dynamic relative phase information for speech emotion recognition.
ICASSP2022
Chen Chen, Yuchen Hu, Nana Hou, Xiaofeng Qi, Heqing Zou, Eng Siong Chng, 
Self-Critical Sequence Training for Automatic Speech Recognition.
ICASSP2022
Chen Chen, Nana Hou, Yuchen Hu, Shashank Shirol, Eng Siong Chng, 
Noise-Robust Speech Recognition With 10 Minutes Unparalleled In-Domain Data.
ICASSP2022
Meng Ge, Chenglin Xu, Longbiao Wang, Eng Siong Chng, Jianwu Dang 0001, Haizhou Li 0001, 
L-SpEx: Localized Target Speaker Extraction.
ICASSP2022
Yuchen Hu, Nana Hou, Chen Chen, Eng Siong Chng, 
Interactive Feature Fusion for End-to-End Noise-Robust Speech Recognition.
ICASSP2022
Dianwen Ng, Yunqi Chen, Biao Tian, Qiang Fu 0001, Eng Siong Chng, 
Convmixer: Feature Interactive Convolution with Curriculum Learning for Small Footprint and Noisy Far-Field Keyword Spotting.
ICASSP2022
Yizhou Peng, Jicheng Zhang, Haihua Xu, Hao Huang, Eng Siong Chng, 
Minimum Word Error Training For Non-Autoregressive Transformer-Based Code-Switching ASR.
ICASSP2022
Fuzhao Xue, Aixin Sun, Hao Zhang 0048, Jinjie Ni, Eng Siong Chng, 
An Embarrassingly Simple Model for Dialogue Relation Extraction.
ICASSP2022
Heqing Zou, Yuke Si, Chen Chen, Deepu Rajan, Eng Siong Chng, 
Speech Emotion Recognition with Co-Attention Based Multi-Level Acoustic Information.
Interspeech2022
Chen Chen, Nana Hou, Yuchen Hu, Heqing Zou, Xiaofeng Qi, Eng Siong Chng, 
Interactive Auido-text Representation for Automated Audio Captioning with Contrastive Learning.
Interspeech2022
Zixun Guo, Chen Chen, Eng Siong Chng, 
DENT-DDSP: Data-efficient noisy speech generator using differentiable digital signal processors for explicit distortion modelling and noise-robust speech recognition.
Interspeech2022
Tarun Gupta, Duc-Tuan Truong, Tran The Anh, Eng Siong Chng, 
Estimation of speaker age and height from speech signal using bi-encoder transformer mixture model.
Interspeech2022
Yang Xiao, Nana Hou, Eng Siong Chng, 
Rainbow Keywords: Efficient Incremental Learning for Online Spoken Keyword Spotting.
ICASSP2021
Meng Ge, Chenglin Xu, Longbiao Wang, Eng Siong Chng, Jianwu Dang 0001, Haizhou Li 0001, 
Multi-Stage Speaker Extraction with Utterance and Frame-Level Reference Signals.
ICASSP2021
Lili Guo, Longbiao Wang, Chenglin Xu, Jianwu Dang 0001, Eng Siong Chng, Haizhou Li 0001, 
Representation Learning with Spectro-Temporal-Channel Attention for Speech Emotion Recognition.
ICASSP2021
Nana Hou, Chenglin Xu, Eng Siong Chng, Haizhou Li 0001, 
Learning Disentangled Feature Representations for Speech Enhancement Via Adversarial Training.
Interspeech2021
Weiguang Chen, Van Tung Pham, Eng Siong Chng, Xionghu Zhong, 
Overlapped Speech Detection Based on Spectral and Spatial Feature Fusion.
Interspeech2021
Jicheng Zhang, Yizhou Peng, Van Tung Pham, Haihua Xu, Hao Huang, Eng Siong Chng, 
E2E-Based Multi-Task Learning Approach to Joint Speech and Accent Recognition.
EMNLP2021
Yingzhu Zhao, Chongjia Ni, Cheung-Chi Leung, Shafiq R. Joty, Eng Siong Chng, Bin Ma 0001, 
A Unified Speaker Adaptation Approach for ASR.
TASLP2020
Chenglin Xu, Wei Rao, Eng Siong Chng, Haizhou Li 0001, 
SpEx: Multi-Scale Time Domain Speaker Extraction Network.
ICASSP2022
Liang Lu 0001, Jinyu Li 0001, Yifan Gong 0001, 
Endpoint Detection for Streaming End-to-End Multi-Talker ASR.
Interspeech2022
Zhong Meng, Yashesh Gaur, Naoyuki Kanda, Jinyu Li 0001, Xie Chen 0001, Yu Wu 0012, Yifan Gong 0001, 
Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition.
ICASSP2021
Zhong Meng, Naoyuki Kanda, Yashesh Gaur, Sarangarajan Parthasarathy, Eric Sun, Liang Lu 0001, Xie Chen 0001, Jinyu Li 0001, Yifan Gong 0001, 
Internal Language Model Training for Domain-Adaptive End-To-End Speech Recognition.
ICASSP2021
Eric Sun, Liang Lu 0001, Zhong Meng, Yifan Gong 0001, 
Sequence-Level Self-Teaching Regularization.
ICASSP2021
Jeremy H. M. Wong, Dimitrios Dimitriadis, Ken'ichi Kumatani, Yashesh Gaur, George Polovets, Partha Parthasarathy, Eric Sun, Jinyu Li 0001, Yifan Gong 0001, 
Ensemble Combination between Different Time Segmentations.
ICASSP2021
Jeremy H. M. Wong, Xiong Xiao, Yifan Gong 0001, 
Hidden Markov Model Diarisation with Speaker Location Information.
ICASSP2021
Xiong Xiao, Naoyuki Kanda, Zhuo Chen 0006, Tianyan Zhou, Takuya Yoshioka, Sanyuan Chen, Yong Zhao 0008, Gang Liu, Yu Wu 0012, Jian Wu 0027, Shujie Liu 0001, Jinyu Li 0001, Yifan Gong 0001, 
Microsoft Speaker Diarization System for the Voxceleb Speaker Recognition Challenge 2020.
Interspeech2021
Liang Lu 0001, Naoyuki Kanda, Jinyu Li 0001, Yifan Gong 0001, 
Streaming Multi-Talker Speech Recognition with Joint Speaker Identification.
Interspeech2021
Liang Lu 0001, Zhong Meng, Naoyuki Kanda, Jinyu Li 0001, Yifan Gong 0001, 
On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer.
Interspeech2021
Yan Huang 0028, Guoli Ye, Jinyu Li 0001, Yifan Gong 0001, 
Rapid Speaker Adaptation for Conformer Transducer: Attention and Bias Are All You Need.
Interspeech2021
Yan Deng, Rui Zhao 0017, Zhong Meng, Xie Chen 0001, Bing Liu, Jinyu Li 0001, Yifan Gong 0001, Lei He 0005, 
Improving RNN-T for Domain Scaling Using Semi-Supervised Training with Neural TTS.
Interspeech2021
Vikas Joshi, Amit Das, Eric Sun, Rupesh R. Mehta, Jinyu Li 0001, Yifan Gong 0001, 
Multiple Softmax Architecture for Streaming Multilingual End-to-End ASR Systems.
Interspeech2021
Zhong Meng, Yu Wu 0012, Naoyuki Kanda, Liang Lu 0001, Xie Chen 0001, Guoli Ye, Eric Sun, Jinyu Li 0001, Yifan Gong 0001, 
Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition.
Interspeech2021
Eric Sun, Jinyu Li 0001, Zhong Meng, Yu Wu 0012, Jian Xue, Shujie Liu 0001, Yifan Gong 0001, 
Improving Multilingual Transformer Transducer Models by Reducing Language Confusions.
ICASSP2020
Yan Huang 0028, Lei He 0005, Wenning Wei, William Gale, Jinyu Li 0001, Yifan Gong 0001, 
Using Personalized Speech Synthesis and Neural Language Generator for Rapid Speaker Adaptation.
ICASSP2020
Hirofumi Inaguma, Yashesh Gaur, Liang Lu 0001, Jinyu Li 0001, Yifan Gong 0001, 
Minimum Latency Training Strategies for Streaming Sequence-to-Sequence ASR.
ICASSP2020
Jinyu Li 0001, Rui Zhao 0017, Eric Sun, Jeremy H. M. Wong, Amit Das, Zhong Meng, Yifan Gong 0001, 
High-Accuracy and Low-Latency Speech Recognition with Two-Head Contextual Layer Trajectory LSTM Model.
ICASSP2020
Zhong Meng, Hu Hu, Jinyu Li 0001, Changliang Liu, Yan Huang 0028, Yifan Gong 0001, Chin-Hui Lee, 
L-Vector: Neural Label Embedding for Domain Adaptation.
ICASSP2020
Eva Sharma, Guoli Ye, Wenning Wei, Rui Zhao 0017, Yao Tian, Jian Wu 0027, Lei He 0005, Ed Lin, Yifan Gong 0001, 
Adaptation of RNN Transducer with Text-To-Speech Technology for Keyword Spotting.
Interspeech2020
Yan Huang 0028, Jinyu Li 0001, Lei He 0005, Wenning Wei, William Gale, Yifan Gong 0001, 
Rapid RNN-T Adaptation Using Personalized Speech Synthesis and Neural Language Generator.
ICASSP2022
Soumya Dutta, Sriram Ganapathy, 
Multimodal Transformer with Learnable Frontend and Self Attention for Emotion Recognition.
ICASSP2022
Varun Krishna, Sriram Ganapathy, 
Self Supervised Representation Learning with Deep Clustering for Acoustic Unit Discovery from Raw Speech.
ICASSP2022
Rohit Kumar, Anurenjan Purushothaman, Anirudh Sreeram, Sriram Ganapathy, 
End-To-End Speech Recognition with Joint Dereverberation of Sub-Band Autoregressive Envelopes.
ICASSP2022
Neeraj Kumar Sharma 0001, Srikanth Raj Chetupalli, Debarpan Bhattacharya, Debottam Dutta, Pravin Mote, Sriram Ganapathy, 
The Second Dicova Challenge: Dataset and Performance Analysis for Diagnosis of Covid-19 Using Acoustics.
Interspeech2022
Shrutina Agarwal, Naoya Takahashi, Sriram Ganapathy, 
Leveraging Symmetrical Convolutional Transformer Networks for Speech to Singing Voice Style Transfer.
Interspeech2022
Tarun Sai Bandarupalli, Shakti Rath, Nirmesh Shah, Naoyuki Onoe, Sriram Ganapathy, 
Semi-supervised Acoustic and Language Modeling for Hindi ASR.
Interspeech2022
Debarpan Bhattacharya, Debottam Dutta, Neeraj Kumar Sharma 0001, Srikanth Raj Chetupalli, Pravin Mote, Sriram Ganapathy, Chandrakiran C, Sahiti Nori, Suhail K. K, Sadhana Gonuguntla, Murali Alagesan, 
Coswara: A website application enabling COVID-19 screening by analysing respiratory sound samples and health symptoms.
Interspeech2022
Debarpan Bhattacharya, Debottam Dutta, Neeraj Kumar Sharma 0001, Srikanth Raj Chetupalli, Pravin Mote, Sriram Ganapathy, Chandrakiran C, Sahiti Nori, Suhail K. K, Sadhana Gonuguntla, Murali Alagesan, 
Analyzing the impact of SARS-CoV-2 variants on respiratory sound signals.
Interspeech2022
Srikanth Raj Chetupalli, Sriram Ganapathy, 
Speaker conditioned acoustic modeling for multi-speaker conversational ASR.
Interspeech2022
Debottam Dutta, Debarpan Bhattacharya, Sriram Ganapathy, Amir Hossein Poorjam, Deepak Mittal, Maneesh Singh 0001, 
Acoustic Representation Learning on Breathing and Speech Signals for COVID-19 Detection.
Interspeech2022
M. K. Jayesh, Mukesh Sharma, Praneeth Vonteddu, Mahaboob Ali Basha Shaik, Sriram Ganapathy, 
Transformer Networks for Non-Intrusive Speech Quality Prediction.
TASLP2021
Prachi Singh, Sriram Ganapathy, 
Self-Supervised Representation Learning With Path Integral Clustering for Speaker Diarization.
ICASSP2021
Purvi Agrawal, Sriram Ganapathy, 
Representation Learning for Speech Recognition Using Feedback Based Relevance Weighting.
ICASSP2021
Jaswanth Reddy Katthi, Sriram Ganapathy, 
Deep Multiway Canonical Correlation Analysis For Multi-Subject Eeg Normalization.
Interspeech2021
Flávio Ávila, Amir H. Poorjam, Deepak Mittal, Charles Dognin, Ananya Muguli, Rohit Kumar, Srikanth Raj Chetupalli, Sriram Ganapathy, Maneesh Singh 0001, 
Investigating Feature Selection and Explainability for COVID-19 Diagnostics from Cough Sounds.
Interspeech2021
Sriram Ganapathy, 
Uncovering the Acoustic Cues of COVID-19 Infection.
Interspeech2021
Ananya Muguli, Lancelot Pinto, Nirmala R., Neeraj Kumar Sharma 0001, Prashant Krishnan V, Prasanta Kumar Ghosh, Rohit Kumar, Shrirama Bhat, Srikanth Raj Chetupalli, Sriram Ganapathy, Shreyas Ramoji, Viral Nanda, 
DiCOVA Challenge: Dataset, Task, and Baseline System for COVID-19 Diagnosis Using Acoustics.
Interspeech2021
R. G. Prithvi Raj, Rohit Kumar, M. K. Jayesh, Anurenjan Purushothaman, Sriram Ganapathy, M. Ali Basha Shaik, 
SRIB-LEAP Submission to Far-Field Multi-Channel Speech Enhancement Challenge for Video Conferencing.
Interspeech2021
Neville Ryant, Prachi Singh, Venkat Krishnamohan, Rajat Varma, Kenneth Church 0001, Christopher Cieri, Jun Du, Sriram Ganapathy, Mark Liberman, 
The Third DIHARD Diarization Challenge.
Interspeech2021
Prachi Singh, Rajat Varma, Venkat Krishnamohan, Srikanth Raj Chetupalli, Sriram Ganapathy, 
LEAP Submission for the Third DIHARD Diarization Challenge.
TASLP2022
Yang Ai, Zhen-Hua Ling, Wei-Lu Wu, Ang Li, 
Denoising-and-Dereverberation Hierarchical Neural Vocoder for Statistical Parametric Speech Synthesis.
ICASSP2022
Yan-Nian Chen, Li-Juan Liu, Ya-Jun Hu, Yuan Jiang 0006, Zhen-Hua Ling, 
Improving Recognition-Synthesis Based any-to-one Voice Conversion with Cyclic Training.
ICASSP2022
Lu Dong, Zhiqiang Guo, Chao-Hong Tan, Ya-Jun Hu, Yuan Jiang 0006, Zhen-Hua Ling, 
Neural Grapheme-To-Phoneme Conversion with Pre-Trained Grapheme Models.
ICASSP2022
Cheng Gong, Longbiao Wang, Zhenhua Ling, Ju Zhang, Jianwu Dang 0001, 
Using Multiple Reference Audios and Style Embedding Constraints for Speech Synthesis.
ICASSP2022
Zhengyan Sheng, Zhiqiang Guo, Xin Li 0064, Yunxia Li, Zhenhua Ling, 
Dementia Detection by Fusing Speech and Eye-Tracking Representation.
ICASSP2022
Ning-Qian Wu, Zhaoci Liu, Zhen-Hua Ling, 
Discourse-Level Prosody Modeling with a Variational Autoencoder for Non-Autoregressive Expressive Speech Synthesis.
Interspeech2022
Chang Liu, Zhen-Hua Ling, Ling-Hui Chen, 
Pronunciation Dictionary-Free Multilingual Speech Synthesis by Combining Unsupervised and Supervised Phonetic Representations.
Interspeech2022
Zhaoci Liu, Ning-Qian Wu, Yajie Zhang, Zhenhua Ling, 
Integrating Discrete Word-Level Style Variations into Non-Autoregressive Acoustic Models for Speech Synthesis.
Interspeech2022
Yukun Peng, Zhenhua Ling, 
Decoupled Pronunciation and Prosody Modeling in Meta-Learning-based Multilingual Speech Synthesis.
TASLP2021
Yi-Yang Ding, Hao-Jian Lin, Li-Juan Liu, Zhen-Hua Ling, Yu Hu 0003, 
Robustness of Speech Spoofing Detectors Against Adversarial Post-Processing of Voice Conversion.
TASLP2021
Jia-Chen Gu, Tianda Li, Zhen-Hua Ling, Quan Liu, Zhiming Su, Yu-Ping Ruan, Xiaodan Zhu, 
Deep Contextualized Utterance Representations for Response Selection and Dialogue Analysis.
TASLP2021
Yajie Zhang, Zhen-Hua Ling, 
Extracting and Predicting Word-Level Style Variations for Speech Synthesis.
TASLP2021
Xiao Zhou, Zhen-Hua Ling, Li-Rong Dai 0001, 
UnitNet: A Sequence-to-Sequence Acoustic Model for Concatenative Speech Synthesis.
ICASSP2021
Cheng Gong, Longbiao Wang, Zhenhua Ling, Shaotong Guo, Ju Zhang 0001, Jianwu Dang 0001, 
Improving Naturalness and Controllability of Sequence-to-Sequence Speech Synthesis by Learning Local Prosody Representations.
ICASSP2021
Zhaoci Liu, Zhiqiang Guo, Zhenhua Ling, Yunxia Li, 
Detecting Alzheimer's Disease from Speech Using Neural Networks with Bottleneck Features and Data Augmentation.
Interspeech2021
Yue Chen, Zhen-Hua Ling, Qing-Feng Liu, 
A Neural-Network-Based Approach to Identifying Speakers in Novels.
Interspeech2021
Yi-Yang Ding, Li-Juan Liu, Yu Hu 0003, Zhen-Hua Ling, 
Adversarial Voice Conversion Against Neural Spoofing Detectors.
Interspeech2021
Xiao Zhou, Zhen-Hua Ling, Li-Rong Dai 0001, 
UnitNet-Based Hybrid Speech Synthesis.
AAAI2021
Jing-Xuan Zhang, Korin Richmond, Zhen-Hua Ling, Lirong Dai 0001, 
TaLNet: Voice Reconstruction from Tongue and Lip Articulation with Transfer Learning from Text-to-Speech Synthesis.
EMNLP2021
Jia-Chen Gu, Zhen-Hua Ling, Yu Wu, Quan Liu, Zhigang Chen 0003, Xiaodan Zhu, 
Detecting Speaker Personas from Conversational Texts.
SpeechComm2022
Lili Guo, Longbiao Wang, Jianwu Dang 0001, Eng Siong Chng, Seiichi Nakagawa, 
Learning affective representations based on magnitude and dynamic relative phase information for speech emotion recognition.
ICASSP2022
Yuan Gao, Shogo Okada, Longbiao Wang, Jiaxing Liu, Jianwu Dang 0001, 
Domain-Invariant Feature Learning for Cross Corpus Speech Emotion Recognition.
ICASSP2022
Meng Ge, Chenglin Xu, Longbiao Wang, Eng Siong Chng, Jianwu Dang 0001, Haizhou Li 0001, 
L-SpEx: Localized Target Speaker Extraction.
ICASSP2022
Cheng Gong, Longbiao Wang, Zhenhua Ling, Ju Zhang, Jianwu Dang 0001, 
Using Multiple Reference Audios and Style Embedding Constraints for Speech Synthesis.
ICASSP2022
Yongjie Lv, Longbiao Wang, Meng Ge, Sheng Li 0010, Chenchen Ding, Lixin Pan, Yuguang Wang, Jianwu Dang 0001, Kiyoshi Honda, 
Compressing Transformer-Based ASR Model by Task-Driven Loss and Attention-Based Multi-Level Feature Distillation.
ICASSP2022
Yaodong Song, Jiaxing Liu, Longbiao Wang, Ruiguo Yu, Jianwu Dang 0001, 
Multi-Stage Graph Representation Learning for Dialogue-Level Speech Emotion Recognition.
ICASSP2022
Kaili Zhang, Cheng Gong, Wenhuan Lu, Longbiao Wang, Jianguo Wei, Dawei Liu, 
Joint and Adversarial Training with ASR for Expressive Speech Synthesis.
ICASSP2022
Hanyi Zhang, Longbiao Wang, Kong Aik Lee, Meng Liu, Jianwu Dang 0001, Hui Chen, 
Learning Domain-Invariant Transformation for Speaker Verification.
ICASSP2022
Xiangyu Zhao, Longbiao Wang, Jianwu Dang 0001, 
Improving Dialogue Generation via Proactively Querying Grounded Knowledge.
Interspeech2022
Yanjie Fu, Meng Ge, Haoran Yin, Xinyuan Qian, Longbiao Wang, Gaoyan Zhang, Jianwu Dang 0001, 
Iterative Sound Source Localization for Unknown Number of Sources.
Interspeech2022
Jiaxu He, Cheng Gong, Longbiao Wang, Di Jin 0001, Xiaobao Wang, Junhai Xu, Jianwu Dang 0001, 
Improve emotional speech synthesis quality by learning explicit and implicit representations with semi-supervised training.
Interspeech2022
Kai Li 0018, Sheng Li 0010, Xugang Lu, Masato Akagi, Meng Liu, Lin Zhang, Chang Zeng, Longbiao Wang, Jianwu Dang 0001, Masashi Unoki, 
Data Augmentation Using McAdams-Coefficient-Based Speaker Anonymization for Fake Audio Detection.
Interspeech2022
Junjie Li, Meng Ge, Zexu Pan, Longbiao Wang, Jianwu Dang 0001, 
VCSE: Time-Domain Visual-Contextual Speaker Extraction Network.
Interspeech2022
Nan Li, Meng Ge, Longbiao Wang, Masashi Unoki, Sheng Li 0010, Jianwu Dang 0001, 
Global Signal-to-noise Ratio Estimation Based on Multi-subband Processing Using Convolutional Neural Network.
Interspeech2022
Siqing Qin, Longbiao Wang, Sheng Li 0010, Yuqin Lin, Jianwu Dang 0001, 
Finer-grained Modeling units-based Meta-Learning for Low-resource Tibetan Speech Recognition.
Interspeech2022
Hao Shi, Longbiao Wang, Sheng Li 0010, Jianwu Dang 0001, Tatsuya Kawahara, 
Monaural Speech Enhancement Based on Spectrogram Decomposition for Convolutional Neural Network-sensitive Feature Extraction.
Interspeech2022
Tongtong Song, Qiang Xu, Meng Ge, Longbiao Wang, Hao Shi, Yongjie Lv, Yuqin Lin, Jianwu Dang 0001, 
Language-specific Characteristic Assistance for Code-switching Speech Recognition.
Interspeech2022
Shiquan Wang, Yuke Si, Xiao Wei, Longbiao Wang, Zhiqiang Zhuang, Xiaowang Zhang, Jianwu Dang 0001, 
TopicKS: Topic-driven Knowledge Selection for Knowledge-grounded Dialogue Generation.
Interspeech2022
Xiao Wei, Yuke Si, Shiquan Wang, Longbiao Wang, Jianwu Dang 0001, 
Hierarchical Tagger with Multi-task Learning for Cross-domain Slot Filling.
Interspeech2022
Qiang Xu, Tongtong Song, Longbiao Wang, Hao Shi, Yuqin Lin, Yongjie Lv, Meng Ge, Qiang Yu 0005, Jianwu Dang 0001, 
Self-Distillation Based on High-level Information Supervision for Compressing End-to-End ASR Model.
ICASSP2022
Jingbei Li, Yi Meng, Chenyi Li, Zhiyong Wu 0001, Helen Meng, Chao Weng, Dan Su 0002, 
Enhancing Speaking Styles in Conversational Text-to-Speech Synthesis with Graph-Based Multi-Modal Context Modeling.
ICASSP2022
Songxiang Liu, Shan Yang, Dan Su 0002, Dong Yu 0001, 
Referee: Towards Reference-Free Cross-Speaker Style Transfer with Low-Quality Data for Expressive Speech Synthesis.
ICASSP2022
Dongpeng Ma, Yiwen Wang, Liqiang He, Mingjie Jin, Dan Su 0002, Dong Yu 0001, 
DP-DWA: Dual-Path Dynamic Weight Attention Network With Streaming Dfsmn-San For Automatic Speech Recognition.
ICASSP2022
Zhao You, Shulin Feng, Dan Su 0002, Dong Yu 0001, 
Speechmoe2: Mixture-of-Experts Model with Improved Routing.
ICASSP2022
Naijun Zheng, Na Li 0012, Xixin Wu, Lingwei Meng, Jiawen Kang, Haibin Wu, Chao Weng, Dan Su 0002, Helen Meng, 
The CUHK-Tencent Speaker Diarization System for the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Challenge.
ICASSP2022
Naijun Zheng, Na Li 0012, Jianwei Yu, Chao Weng, Dan Su 0002, Xunying Liu, Helen Meng, 
Multi-Channel Speaker Diarization Using Spatial Features for Meetings.
Interspeech2022
Yi Lei, Shan Yang, Jian Cong, Lei Xie 0001, Dan Su 0002, 
Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice Conversion.
Interspeech2022
Xiaoyi Qin, Na Li 0012, Chao Weng, Dan Su 0002, Ming Li 0026, 
Cross-Age Speaker Verification: Learning Age-Invariant Speaker Embeddings.
Interspeech2022
Liumeng Xue, Shan Yang, Na Hu, Dan Su 0002, Lei Xie 0001, 
Learning Noise-independent Speech Representation for High-quality Voice Conversion for Noisy Target Speakers.
Interspeech2022
Yixuan Zhou, Changhe Song, Jingbei Li, Zhiyong Wu 0003, Yanyao Bian, Dan Su 0002, Helen Meng, 
Enhancing Word-Level Semantic Representation via Dependency Structure for Expressive Text-to-Speech Synthesis.
Interspeech2022
Yixuan Zhou, Changhe Song, Xiang Li, Luwen Zhang, Zhiyong Wu 0001, Yanyao Bian, Dan Su 0002, Helen Meng, 
Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synthesis.
ICLR2022
Max W. Y. Lam, Jun Wang 0091, Dan Su 0002, Dong Yu 0001, 
BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis.
IJCAI2022
Rongjie Huang, Max W. Y. Lam, Jun Wang 0091, Dan Su 0002, Dong Yu 0001, Yi Ren 0006, Zhou Zhao, 
FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis.
ICASSP2021
Liqiang He, Dan Su 0002, Dong Yu 0001, 
Learned Transferable Architectures Can Surpass Hand-Designed Architectures for Large Scale Speech Recognition.
ICASSP2021
Max W. Y. Lam, Jun Wang 0091, Dan Su 0002, Dong Yu 0001, 
Sandglasset: A Light Multi-Granularity Self-Attentive Network for Time-Domain Speech Separation.
ICASSP2021
Xu Li, Na Li 0012, Chao Weng, Xunying Liu, Dan Su 0002, Dong Yu 0001, Helen Meng, 
Replay and Synthetic Speech Detection with Res2Net Architecture.
ICASSP2021
Xingchen Song, Zhiyong Wu 0001, Yiheng Huang, Chao Weng, Dan Su 0002, Helen M. Meng, 
Non-Autoregressive Transformer ASR with CTC-Enhanced Decoder Input.
ICASSP2021
Naijun Zheng, Na Li 0012, Bo Wu, Meng Yu 0003, Jianwei Yu, Chao Weng, Dan Su 0002, Xunying Liu, Helen Meng, 
A Joint Training Framework of Multi-Look Separator and Speaker Embedding Extractor for Overlapped Speech.
Interspeech2021
Guoguo Chen, Shuzhou Chai, Guan-Bo Wang, Jiayu Du, Wei-Qiang Zhang, Chao Weng, Dan Su 0002, Daniel Povey, Jan Trmal, Junbo Zhang, Mingjie Jin, Sanjeev Khudanpur, Shinji Watanabe 0001, Shuaijiang Zhao, Wei Zou, Xiangang Li, Xuchen Yao, Yongqing Wang, Zhao You, Zhiyong Yan, 
GigaSpeech: An Evolving, Multi-Domain ASR Corpus with 10, 000 Hours of Transcribed Audio.
Interspeech2021
Jian Cong, Shan Yang, Na Hu, Guangzhi Li, Lei Xie 0001, Dan Su 0002, 
Controllable Context-Aware Conversational Speech Synthesis.
ICASSP2022
Zhehuai Chen, Yu Zhang 0033, Andrew Rosenberg, Bhuvana Ramabhadran, Pedro J. Moreno 0001, Gary Wang, 
Tts4pretrain 2.0: Advancing the use of Text and Speech in ASR Pretraining with Consistency and Contrastive Losses.
ICASSP2022
Qiujia Li, Yu Zhang 0033, David Qiu, Yanzhang He, Liangliang Cao, Philip C. Woodland, 
Improving Confidence Estimation on Out-of-Domain Data for End-to-End Speech Recognition.
ICASSP2022
Tara N. Sainath, Yanzhang He, Arun Narayanan, Rami Botros, Weiran Wang, David Qiu, Chung-Cheng Chiu, Rohit Prabhavalkar, Alexander Gruenstein, Anmol Gulati, Bo Li 0028, David Rybach, Emmanuel Guzman, Ian McGraw, James Qin, Krzysztof Choromanski, Qiao Liang, Robert David, Ruoming Pang, Shuo-Yiin Chang, Trevor Strohman, W. Ronny Huang, Wei Han 0002, Yonghui Wu, Yu Zhang 0033, 
Improving The Latency And Quality Of Cascaded Encoders.
ICASSP2022
Joel Shor, Aren Jansen, Wei Han 0002, Daniel S. Park, Yu Zhang 0033, 
Universal Paralinguistic Speech Representations Using self-Supervised Conformers.
Interspeech2022
Murali Karthick Baskar, Andrew Rosenberg, Bhuvana Ramabhadran, Yu Zhang 0033, Nicolás Serrano, 
Reducing Domain mismatch in Self-supervised speech pre-training.
Interspeech2022
Zhehuai Chen, Yu Zhang 0033, Andrew Rosenberg, Bhuvana Ramabhadran, Pedro J. Moreno 0001, Ankur Bapna, Heiga Zen, 
MAESTRO: Matched Speech Text Representations through Modality Matching.
Interspeech2022
Alexis Conneau, Ankur Bapna, Yu Zhang 0033, Min Ma, Patrick von Platen, Anton Lozhkov, Colin Cherry, Ye Jia, Clara Rivera, Mihir Kale, Daan van Esch, Vera Axelrod, Simran Khanuja, Jonathan H. Clark, Orhan Firat, Michael Auli, Sebastian Ruder, Jason Riesa, Melvin Johnson, 
XTREME-S: Evaluating Cross-lingual Speech Representations.
Interspeech2022
Lev Finkelstein, Heiga Zen, Norman Casagrande, Chun-an Chan, Ye Jia, Tom Kenter, Alexey Petelin, Jonathan Shen, Vincent Wan, Yu Zhang 0033, Yonghui Wu, Rob Clark, 
Training Text-To-Speech Systems From Synthetic Data: A Practical Approach For Accent Transfer Tasks.
Interspeech2022
Kuan-Po Huang, Yu-Kuan Fu, Yu Zhang 0033, Hung-yi Lee, 
Improving Distortion Robustness of Self-supervised Speech Processing Tasks with Domain Adaptation.
Interspeech2022
Ye Jia, Yifan Ding, Ankur Bapna, Colin Cherry, Yu Zhang 0033, Alexis Conneau, Nobu Morioka, 
Leveraging unsupervised and weakly-supervised data to improve direct speech-to-speech translation.
Interspeech2022
Zhiyun Lu, Yongqiang Wang, Yu Zhang 0033, Wei Han, Zhehuai Chen, Parisa Haghani, 
Unsupervised Data Selection via Discrete Speech Representation for ASR.
ICASSP2021
Thibault Doutre, Wei Han 0002, Min Ma, Zhiyun Lu, Chung-Cheng Chiu, Ruoming Pang, Arun Narayanan, Ananya Misra, Yu Zhang 0033, Liangliang Cao, 
Improving Streaming Automatic Speech Recognition with Non-Streaming Model Distillation on Unsupervised Data.
ICASSP2021
Isaac Elias, Heiga Zen, Jonathan Shen, Yu Zhang 0033, Ye Jia, Ron J. Weiss, Yonghui Wu, 
Parallel Tacotron: Non-Autoregressive and Controllable TTS.
ICASSP2021
Bo Li 0028, Anmol Gulati, Jiahui Yu, Tara N. Sainath, Chung-Cheng Chiu, Arun Narayanan, Shuo-Yiin Chang, Ruoming Pang, Yanzhang He, James Qin, Wei Han 0002, Qiao Liang, Yu Zhang 0033, Trevor Strohman, Yonghui Wu, 
A Better and Faster end-to-end Model for Streaming ASR.
ICASSP2021
Qiujia Li, David Qiu, Yu Zhang 0033, Bo Li 0028, Yanzhang He, Philip C. Woodland, Liangliang Cao, Trevor Strohman, 
Confidence Estimation for Attention-Based Sequence-to-Sequence Models for Speech Recognition.
ICASSP2021
David Qiu, Qiujia Li, Yanzhang He, Yu Zhang 0033, Bo Li 0028, Liangliang Cao, Rohit Prabhavalkar, Deepti Bhatia, Wei Li 0133, Ke Hu, Tara N. Sainath, Ian McGraw, 
Learning Word-Level Confidence for Subword End-To-End ASR.
ICASSP2021
Harsh Shrivastava 0001, Ankush Garg, Yuan Cao 0007, Yu Zhang 0033, Tara N. Sainath, 
Echo State Speech Recognition.
Interspeech2021
Zhehuai Chen, Andrew Rosenberg, Yu Zhang 0033, Heiga Zen, Mohammadreza Ghodsi, Yinghui Huang, Jesse Emond, Gary Wang, Bhuvana Ramabhadran, Pedro J. Moreno 0001, 
Semi-Supervision in ASR: Sequential MixMatch and Factorized TTS-Based Augmentation.
Interspeech2021
Nanxin Chen, Yu Zhang 0033, Heiga Zen, Ron J. Weiss, Mohammad Norouzi 0002, Najim Dehak, William Chan, 
WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis.
Interspeech2021
Isaac Elias, Heiga Zen, Jonathan Shen, Yu Zhang 0033, Ye Jia, R. J. Skerry-Ryan, Yonghui Wu, 
Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling.
ICASSP2022
Tina Raissi, Eugen Beck, Ralf Schlüter, Hermann Ney, 
Improving Factored Hybrid HMM Acoustic Modeling without State Tying.
ICASSP2022
Nils-Philipp Wynands, Wilfried Michel, Jan Rosendahl, Ralf Schlüter, Hermann Ney, 
Efficient Sequence Training of Attention Models Using Approximative Recombination.
ICASSP2022
Mohammad Zeineldeen, Jingjing Xu, Christoph Lüscher, Wilfried Michel, Alexander Gerstenberger, Ralf Schlüter, Hermann Ney, 
Conformer-Based Hybrid ASR System For Switchboard Dataset.
ICASSP2022
Wei Zhou, Zuoyun Zheng, Ralf Schlüter, Hermann Ney, 
On Language Model Integration for RNN Transducer Based Speech Recognition.
Interspeech2022
Felix Meyer, Wilfried Michel, Mohammad Zeineldeen, Ralf Schlüter, Hermann Ney, 
Automatic Learning of Subword Dependent Model Scales.
Interspeech2022
Zijian Yang, Yingbo Gao, Alexander Gerstenberger, Jintao Jiang, Ralf Schlüter, Hermann Ney, 
Self-Normalized Importance Sampling for Neural Language Modeling.
Interspeech2022
Mohammad Zeineldeen, Jingjing Xu, Christoph Lüscher, Ralf Schlüter, Hermann Ney, 
Improving the Training Recipe for a Robust Conformer-based Hybrid Model.
Interspeech2022
Wei Zhou, Wilfried Michel, Ralf Schlüter, Hermann Ney, 
Efficient Training of Neural Transducer for Speech Recognition.
Interspeech2021
Yingbo Gao, David Thulke, Alexander Gerstenberger, Khoa Viet Tran, Ralf Schlüter, Hermann Ney, 
On Sampling-Based Training Criteria for Neural Language Modeling.
Interspeech2021
Hermann Ney, 
Forty Years of Speech and Language Processing: From Bayes Decision Rule to Deep Learning.
Interspeech2021
Mohammad Zeineldeen, Aleksandr Glushko, Wilfried Michel, Albert Zeyer, Ralf Schlüter, Hermann Ney, 
Investigating Methods to Improve Language Model Integration for Attention-Based Encoder-Decoder ASR Models.
Interspeech2021
Albert Zeyer, André Merboldt, Wilfried Michel, Ralf Schlüter, Hermann Ney, 
Librispeech Transducer Model with Internal Language Model Prior Correction.
Interspeech2021
Wei Zhou, Albert Zeyer, André Merboldt, Ralf Schlüter, Hermann Ney, 
Equivalence of Segmental and Neural Transducer Modeling: A Proof of Concept.
Interspeech2021
Wei Zhou, Mohammad Zeineldeen, Zuoyun Zheng, Ralf Schlüter, Hermann Ney, 
Acoustic Data-Driven Subword Modeling for End-to-End Speech Recognition.
ICASSP2020
Vitalii Bozheniuk, Albert Zeyer, Ralf Schlüter, Hermann Ney, 
A Comprehensive Study of Residual CNNS for Acoustic Modeling in ASR.
ICASSP2020
Alexander Gerstenberger, Kazuki Irie, Pavel Golik, Eugen Beck, Hermann Ney, 
Domain Robust, Fast, and Compact Neural Language Models.
ICASSP2020
Kazuki Irie, Alexander Gerstenberger, Ralf Schlüter, Hermann Ney, 
How Much Self-Attention Do We Needƒ Trading Attention for Feed-Forward Layers.
ICASSP2020
Wilfried Michel, Ralf Schlüter, Hermann Ney, 
Frame-Level MMI as A Sequence Discriminative Training Criterion for LVCSR.
ICASSP2020
Nick Rossenbach, Albert Zeyer, Ralf Schlüter, Hermann Ney, 
Generating Synthetic Audio Data for Attention-Based Speech Recognition Systems.
ICASSP2020
Mohammad Zeineldeen, Albert Zeyer, Ralf Schlüter, Hermann Ney, 
Layer-Normalized LSTM for Hybrid-Hmm and End-To-End ASR.
TASLP2022
Sashi Novitasari, Sakriani Sakti, Satoshi Nakamura 0001, 
A Machine Speech Chain Approach for Dynamically Adaptive Lombard TTS in Static and Dynamic Noise Environments.
TASLP2022
Bin Wu, Sakriani Sakti, Jinsong Zhang 0001, Satoshi Nakamura 0001, 
Modeling Unsupervised Empirical Adaptation by DPGMM and DPGMM-RNN Hybrid Model to Extract Perceptual Features for Low-Resource ASR.
Interspeech2022
Ryo Fukuda, Katsuhito Sudoh, Satoshi Nakamura 0001, 
Speech Segmentation Optimization using Segmented Bilingual Speech Corpus for End-to-end Speech Translation.
Interspeech2022
Seiya Kawano, Muteki Arioka, Akishige Yuguchi, Kenta Yamamoto, Koji Inoue, Tatsuya Kawahara, Satoshi Nakamura 0001, Koichiro Yoshino, 
Multimodal Persuasive Dialogue Corpus using Teleoperated Android.
Interspeech2022
Heli Qi, Sashi Novitasari, Sakriani Sakti, Satoshi Nakamura 0001, 
Improved Consistency Training for Semi-Supervised Sequence-to-Sequence ASR via Speech Chain Reconstruction and Self-Transcribing.
TASLP2021
Bin Wu, Sakriani Sakti, Jinsong Zhang 0001, Satoshi Nakamura 0001, 
Tackling Perception Bias in Unsupervised Phoneme Discovery Using DPGMM-RNN Hybrid Model and Functional Load.
Interspeech2021
Johanes Effendi, Sakriani Sakti, Satoshi Nakamura 0001, 
Weakly-Supervised Speech-to-Text Mapping with Visually Connected Non-Parallel Speech-Text Data Using Cyclic Partially-Aligned Transformer.
Interspeech2021
Yuka Ko, Katsuhito Sudoh, Sakriani Sakti, Satoshi Nakamura 0001, 
ASR Posterior-Based Loss for Multi-Task End-to-End Speech Translation.
Interspeech2021
Sashi Novitasari, Sakriani Sakti, Satoshi Nakamura 0001, 
Dynamically Adaptive Machine Speech Chain Inference for TTS in Noisy Environment: Listen and Speak Louder.
Interspeech2021
Shun Takahashi, Sakriani Sakti, Satoshi Nakamura 0001, 
Unsupervised Neural-Based Graph Clustering for Variable-Length Speech Representation Discovery of Zero-Resource Languages.
Interspeech2021
Hirotaka Tokuyama, Sakriani Sakti, Katsuhito Sudoh, Satoshi Nakamura 0001, 
Transcribing Paralinguistic Acoustic Cues to Target Language Text in Transformer-Based Speech-to-Text Translation.
TASLP2020
Andros Tjandra, Sakriani Sakti, Satoshi Nakamura 0001, 
Machine Speech Chain.
TASLP2020
Andros Tjandra, Sakriani Sakti, Satoshi Nakamura 0001, 
Corrections to "Machine Speech Chain".
ICASSP2020
Andros Tjandra, Chunxi Liu, Frank Zhang 0001, Xiaohui Zhang, Yongqiang Wang 0005, Gabriel Synnaeve, Satoshi Nakamura 0001, Geoffrey Zweig, 
DEJA-VU: Double Feature Presentation and Iterated Loss in Deep Transformer Networks.
Interspeech2020
Johanes Effendi, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura 0001, 
Augmenting Images for ASR and TTS Through Single-Loop and Dual-Loop Multimodal Chain Framework.
Interspeech2020
Sashi Novitasari, Andros Tjandra, Tomoya Yanagita, Sakriani Sakti, Satoshi Nakamura 0001, 
Incremental Machine Speech Chain Towards Enabling Listening While Speaking in Real-Time.
Interspeech2020
Ivan Halim Parmonangan, Hiroki Tanaka, Sakriani Sakti, Satoshi Nakamura 0001, 
Combining Audio and Brain Activity for Predicting Speech Quality.
Interspeech2020
Andros Tjandra, Sakriani Sakti, Satoshi Nakamura 0001, 
Transformer VQ-VAE for Unsupervised Unit Discovery and Speech Synthesis: ZeroSpeech 2020 Challenge.
Interspeech2020
Kazuki Tsunematsu, Johanes Effendi, Sakriani Sakti, Satoshi Nakamura 0001, 
Neural Speech Completion.
TASLP2019
Nurul Lubis, Sakriani Sakti, Koichiro Yoshino, Satoshi Nakamura 0001, 
Positive Emotion Elicitation in Chat-Based Dialogue Systems.
ICASSP2022
Keisuke Kinoshita, Marc Delcroix, Tomoharu Iwata, 
Tight Integration Of Neural- And Clustering-Based Diarization Through Deep Unfolding Of Infinite Gaussian Mixture Model.
ICASSP2022
Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Naoyuki Kamo, Takafumi Moriya, 
Learning to Enhance or Not: Neural Network-Based Switching of Enhanced and Observed Signals for Overlapping Speech Recognition.
ICASSP2022
Hiroshi Sawada, Rintaro Ikeshita, Keisuke Kinoshita, Tomohiro Nakatani, 
Multi-Frame Full-Rank Spatial Covariance Analysis for Underdetermined BSS in Reverberant Environments.
Interspeech2022
Marc Delcroix, Keisuke Kinoshita, Tsubasa Ochiai, Katerina Zmolíková, Hiroshi Sato, Tomohiro Nakatani, 
Listen only to me! How well can target speech extraction handle false alarms?
Interspeech2022
Keisuke Kinoshita, Thilo von Neumann, Marc Delcroix, Christoph Böddeker, Reinhold Haeb-Umbach, 
Utterance-by-utterance overlap-aware neural diarization with Graph-PIT.
Interspeech2022
Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Takafumi Moriya, Naoki Makishima, Mana Ihori, Tomohiro Tanaka, Ryo Masumura, 
Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations.
ICASSP2021
Christoph Böddeker, Wangyou Zhang, Tomohiro Nakatani, Keisuke Kinoshita, Tsubasa Ochiai, Marc Delcroix, Naoyuki Kamo, Yanmin Qian, Reinhold Haeb-Umbach, 
Convolutive Transfer Function Invariant SDR Training Criteria for Multi-Channel Reverberant Speech Separation.
ICASSP2021
Marc Delcroix, Katerina Zmolíková, Tsubasa Ochiai, Keisuke Kinoshita, Tomohiro Nakatani, 
Speaker Activity Driven Neural Speech Extraction.
ICASSP2021
Chenda Li, Zhuo Chen 0006, Yi Luo 0004, Cong Han, Tianyan Zhou, Keisuke Kinoshita, Marc Delcroix, Shinji Watanabe 0001, Yanmin Qian, 
Dual-Path Modeling for Long Recording Speech Separation in Meetings.
ICASSP2021
Julio Wissing, Benedikt T. Boenninghoff, Dorothea Kolossa, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Shoko Araki, Christopher Schymura, 
Data Fusion for Audiovisual Speaker Localization: Extending Dynamic Stream Weights to the Spatial Domain.
Interspeech2021
Marc Delcroix, Jorge Bennasar Vázquez, Tsubasa Ochiai, Keisuke Kinoshita, Shoko Araki, 
Few-Shot Learning of New Sound Classes for Target Sound Extraction.
Interspeech2021
Cong Han, Yi Luo 0004, Chenda Li, Tianyan Zhou, Keisuke Kinoshita, Shinji Watanabe 0001, Marc Delcroix, Hakan Erdogan, John R. Hershey, Nima Mesgarani, Zhuo Chen 0006, 
Continuous Speech Separation Using Speaker Inventory for Long Recording.
Interspeech2021
Keisuke Kinoshita, Marc Delcroix, Naohiro Tawara, 
Advances in Integration of End-to-End Neural and Clustering-Based Diarization for Real Conversational Speech.
Interspeech2021
Thilo von Neumann, Keisuke Kinoshita, Christoph Böddeker, Marc Delcroix, Reinhold Haeb-Umbach, 
Graph-PIT: Generalized Permutation Invariant Training for Continuous Separation of Arbitrary Numbers of Speakers.
Interspeech2021
Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Takafumi Moriya, Naoyuki Kamo, 
Should We Always Separate?: Switching Between Enhanced and Observed Signals for Overlapping Speech Recognition.
Interspeech2021
Christopher Schymura, Benedikt T. Bönninghoff, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Shoko Araki, Dorothea Kolossa, 
PILOT: Introducing Transformers for Probabilistic Sound Event Localization.
Interspeech2021
Ayako Yamamoto, Toshio Irino, Kenichi Arai, Shoko Araki, Atsunori Ogawa, Keisuke Kinoshita, Tomohiro Nakatani, 
Comparison of Remote Experiments Using Crowdsourcing and Laboratory Experiments on Speech Intelligibility.
SpeechComm2020
Katsuhiko Yamamoto, Toshio Irino, Shoko Araki, Keisuke Kinoshita, Tomohiro Nakatani, 
GEDI: Gammachirp envelope distortion index for predicting intelligibility of enhanced speech.
TASLP2020
Tomohiro Nakatani, Christoph Böddeker, Keisuke Kinoshita, Rintaro Ikeshita, Marc Delcroix, Reinhold Haeb-Umbach, 
Jointly Optimal Denoising, Dereverberation, and Source Separation.
ICASSP2020
Marc Delcroix, Tsubasa Ochiai, Katerina Zmolíková, Keisuke Kinoshita, Naohiro Tawara, Tomohiro Nakatani, Shoko Araki, 
Improving Speaker Discrimination of Target Speech Extraction With Time-Domain Speakerbeam.
ICASSP2022
Tina Raissi, Eugen Beck, Ralf Schlüter, Hermann Ney, 
Improving Factored Hybrid HMM Acoustic Modeling without State Tying.
ICASSP2022
Nils-Philipp Wynands, Wilfried Michel, Jan Rosendahl, Ralf Schlüter, Hermann Ney, 
Efficient Sequence Training of Attention Models Using Approximative Recombination.
ICASSP2022
Mohammad Zeineldeen, Jingjing Xu, Christoph Lüscher, Wilfried Michel, Alexander Gerstenberger, Ralf Schlüter, Hermann Ney, 
Conformer-Based Hybrid ASR System For Switchboard Dataset.
ICASSP2022
Wei Zhou, Zuoyun Zheng, Ralf Schlüter, Hermann Ney, 
On Language Model Integration for RNN Transducer Based Speech Recognition.
Interspeech2022
Felix Meyer, Wilfried Michel, Mohammad Zeineldeen, Ralf Schlüter, Hermann Ney, 
Automatic Learning of Subword Dependent Model Scales.
Interspeech2022
Zijian Yang, Yingbo Gao, Alexander Gerstenberger, Jintao Jiang, Ralf Schlüter, Hermann Ney, 
Self-Normalized Importance Sampling for Neural Language Modeling.
Interspeech2022
Mohammad Zeineldeen, Jingjing Xu, Christoph Lüscher, Ralf Schlüter, Hermann Ney, 
Improving the Training Recipe for a Robust Conformer-based Hybrid Model.
Interspeech2022
Wei Zhou, Wilfried Michel, Ralf Schlüter, Hermann Ney, 
Efficient Training of Neural Transducer for Speech Recognition.
Interspeech2021
Yingbo Gao, David Thulke, Alexander Gerstenberger, Khoa Viet Tran, Ralf Schlüter, Hermann Ney, 
On Sampling-Based Training Criteria for Neural Language Modeling.
Interspeech2021
Yu Qiao 0005, Wei Zhou, Elma Kerz, Ralf Schlüter, 
The Impact of ASR on the Automatic Analysis of Linguistic Complexity and Sophistication in Spontaneous L2 Speech.
Interspeech2021
Mohammad Zeineldeen, Aleksandr Glushko, Wilfried Michel, Albert Zeyer, Ralf Schlüter, Hermann Ney, 
Investigating Methods to Improve Language Model Integration for Attention-Based Encoder-Decoder ASR Models.
Interspeech2021
Albert Zeyer, André Merboldt, Wilfried Michel, Ralf Schlüter, Hermann Ney, 
Librispeech Transducer Model with Internal Language Model Prior Correction.
Interspeech2021
Wei Zhou, Albert Zeyer, André Merboldt, Ralf Schlüter, Hermann Ney, 
Equivalence of Segmental and Neural Transducer Modeling: A Proof of Concept.
Interspeech2021
Wei Zhou, Mohammad Zeineldeen, Zuoyun Zheng, Ralf Schlüter, Hermann Ney, 
Acoustic Data-Driven Subword Modeling for End-to-End Speech Recognition.
ICASSP2020
Vitalii Bozheniuk, Albert Zeyer, Ralf Schlüter, Hermann Ney, 
A Comprehensive Study of Residual CNNS for Acoustic Modeling in ASR.
ICASSP2020
Kazuki Irie, Alexander Gerstenberger, Ralf Schlüter, Hermann Ney, 
How Much Self-Attention Do We Needƒ Trading Attention for Feed-Forward Layers.
ICASSP2020
Wilfried Michel, Ralf Schlüter, Hermann Ney, 
Frame-Level MMI as A Sequence Discriminative Training Criterion for LVCSR.
ICASSP2020
Nick Rossenbach, Albert Zeyer, Ralf Schlüter, Hermann Ney, 
Generating Synthetic Audio Data for Attention-Based Speech Recognition Systems.
ICASSP2020
Mohammad Zeineldeen, Albert Zeyer, Ralf Schlüter, Hermann Ney, 
Layer-Normalized LSTM for Hybrid-Hmm and End-To-End ASR.
ICASSP2020
Wei Zhou, Wilfried Michel, Kazuki Irie, Markus Kitza, Ralf Schlüter, Hermann Ney, 
The Rwth Asr System for Ted-Lium Release 2: Improving Hybrid Hmm With Specaugment.
Interspeech2022
Fangjun Kuang, Liyong Guo, Wei Kang 0006, Long Lin, Mingshuang Luo, Zengwei Yao, Daniel Povey, 
Pruned RNN-T for fast, memory-efficient ASR training.
ICASSP2021
Hang Lv 0001, Zhehuai Chen, Hainan Xu, Daniel Povey, Lei Xie 0001, Sanjeev Khudanpur, 
An Asynchronous WFST-Based Decoder for Automatic Speech Recognition.
ICASSP2021
Ke Li 0018, Daniel Povey, Sanjeev Khudanpur, 
A Parallelizable Lattice Rescoring Strategy with Neural Language Models.
ICASSP2021
Yiming Wang, Hang Lv 0001, Daniel Povey, Lei Xie 0001, Sanjeev Khudanpur, 
Wake Word Detection with Streaming Transformers.
Interspeech2021
Guoguo Chen, Shuzhou Chai, Guan-Bo Wang, Jiayu Du, Wei-Qiang Zhang, Chao Weng, Dan Su 0002, Daniel Povey, Jan Trmal, Junbo Zhang, Mingjie Jin, Sanjeev Khudanpur, Shinji Watanabe 0001, Shuaijiang Zhao, Wei Zou, Xiangang Li, Xuchen Yao, Yongqing Wang, Zhao You, Zhiyong Yan, 
GigaSpeech: An Evolving, Multi-Domain ASR Corpus with 10, 000 Hours of Transcribed Audio.
Interspeech2021
Junbo Zhang, Zhiwen Zhang, Yongqing Wang, Zhiyong Yan, Qiong Song, Yukai Huang, Ke Li 0018, Daniel Povey, Yujun Wang, 
speechocean762: An Open-Source Non-Native English Speech Corpus for Pronunciation Assessment.
ICASSP2020
Hugo Braun, Justin Luitjens, Ryan Leary, Tim Kaldewey, Daniel Povey, 
Gpu-Accelerated Viterbi Exact Lattice Decoder for Batched Online and Offline Speech Recognition.
ICASSP2020
Ke Li 0018, Zhe Liu 0011, Tianxing He, Hongzhao Huang, Fuchun Peng, Daniel Povey, Sanjeev Khudanpur, 
An Empirical Study of Transformer-Based Neural Language Model Adaptation.
Interspeech2020
Pegah Ghahramani, Hossein Hadian, Daniel Povey, Hynek Hermansky, Sanjeev Khudanpur, 
An Alternative to MFCCs for ASR.
Interspeech2020
Ruizhe Huang, Ke Li 0018, Ashish Arora, Daniel Povey, Sanjeev Khudanpur, 
Efficient MDI Adaptation for n-Gram Language Models.
Interspeech2020
Ke Li 0018, Daniel Povey, Sanjeev Khudanpur, 
Neural Language Modeling with Implicit Cache Pointers.
Interspeech2020
Srikanth R. Madikeri, Banriskhem K. Khonglah, Sibo Tong, Petr Motlícek, Hervé Bourlard, Daniel Povey, 
Lattice-Free Maximum Mutual Information Training of Multilingual Speech Recognition Systems.
Interspeech2020
Yiwen Shao, Yiming Wang, Daniel Povey, Sanjeev Khudanpur, 
PyChain: A Fully Parallelized PyTorch Implementation of LF-MMI for End-to-End ASR.
Interspeech2020
Yiming Wang, Hang Lv 0001, Daniel Povey, Lei Xie 0001, Sanjeev Khudanpur, 
Wake Word Detection with Alignment-Free Lattice-Free MMI.
ICASSP2019
David Snyder, Daniel Garcia-Romero, Gregory Sell, Alan McCree, Daniel Povey, Sanjeev Khudanpur, 
Speaker Recognition for Multi-speaker Conversations Using X-vectors.
Interspeech2019
Daniel Garcia-Romero, David Snyder, Gregory Sell, Alan McCree, Daniel Povey, Sanjeev Khudanpur, 
x-Vector DNN Refinement with Full-Length Recordings for Speaker Recognition.
Interspeech2019
Daniel Garcia-Romero, David Snyder, Shinji Watanabe 0001, Gregory Sell, Alan McCree, Daniel Povey, Sanjeev Khudanpur, 
Speaker Recognition Benchmark Using the CHiME-5 Corpus.
Interspeech2019
Mousmita Sarma, Pegah Ghahremani, Daniel Povey, Nagendra Kumar Goel, Kandarpa Kumar Sarma, Najim Dehak, 
Improving Emotion Identification Using Phone Posteriors in Raw Speech Waveform Based DNN.
Interspeech2019
David Snyder, Jesús Villalba 0001, Nanxin Chen, Daniel Povey, Gregory Sell, Najim Dehak, Sanjeev Khudanpur, 
The JHU Speaker Recognition System for the VOiCES 2019 Challenge.
Interspeech2019
Jesús Villalba 0001, Nanxin Chen, David Snyder, Daniel Garcia-Romero, Alan McCree, Gregory Sell, Jonas Borgstrom, Fred Richardson, Suwon Shon, François Grondin, Réda Dehak, Leibny Paola García-Perera, Daniel Povey, Pedro A. Torres-Carrasquillo, Sanjeev Khudanpur, Najim Dehak, 
State-of-the-Art Speaker Recognition for Telephone and Video Speech: The JHU-MIT Submission for NIST SRE18.
ICASSP2022
Guangyan Zhang, Yichong Leng, Daxin Tan, Ying Qin, Kaitao Song, Xu Tan 0003, Sheng Zhao, Tan Lee, 
A Study on the Efficacy of Model Pre-Training In Developing Neural Text-to-Speech System.
Interspeech2022
Jonathan Him Nok Lee, Dehua Tao, Harold Chui, Tan Lee, Sarah Luk, Nicolette Wing Tung Lee, Koonkan Fung, 
Durational Patterning at Discourse Boundaries in Relation to Therapist Empathy in Psychotherapy.
Interspeech2022
Jingyu Li, Wei Liu, Tan Lee, 
EDITnet: A Lightweight Network for Unsupervised Domain Adaptation in Speaker Verification.
Interspeech2022
Si Ioi Ng, Cymie Wing-Yee Ng, Jiarui Wang, Tan Lee, 
Automatic Detection of Speech Sound Disorder in Child Speech Using Posterior-based Speaker Representations.
Interspeech2022
Zhiyuan Peng, Xuanji He, Ke Ding, Tan Lee, Guanglu Wan, 
Unifying Cosine and PLDA Back-ends for Speaker Verification.
Interspeech2022
Daxin Tan, Guangyan Zhang, Tan Lee, 
Environment Aware Text-to-Speech Synthesis.
Interspeech2022
Dehua Tao, Tan Lee, Harold Chui, Sarah Luk, 
Characterizing Therapist's Speaking Style in Relation to Empathy in Psychotherapy.
Interspeech2022
Dehua Tao, Tan Lee, Harold Chui, Sarah Luk, 
Hierarchical Attention Network for Evaluating Therapist Empathy in Counseling Session.
Interspeech2022
Yusheng Tian, Jingyu Li, Tan Lee, 
Transport-Oriented Feature Aggregation for Speaker Embedding Learning.
Interspeech2022
Guangyan Zhang, Kaitao Song, Xu Tan 0003, Daxin Tan, Yuzi Yan, Yanqing Liu, Gang Wang, Wei Zhou, Tao Qin, Tan Lee, Sheng Zhao, 
Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech.
TASLP2021
Xurong Xie, Xunying Liu, Tan Lee, Lan Wang, 
Bayesian Learning for Deep Neural Network Adaptation.
Interspeech2021
Si Ioi Ng, Cymie Wing-Yee Ng, Jingyu Li, Tan Lee, 
Detection of Consonant Errors in Disordered Speech Based on Consonant-Vowel Segment Embedding.
Interspeech2021
Zhiyuan Peng, Xu Li, Tan Lee, 
Pairing Weak with Strong: Twin Models for Defending Against Adversarial Attack on Speaker Verification.
Interspeech2021
Daxin Tan, Tan Lee, 
Fine-Grained Style Modeling, Transfer and Prediction in Text-to-Speech Synthesis via Phone-Level Content-Style Disentanglement.
Interspeech2021
Guangyan Zhang, Ying Qin, Daxin Tan, Tan Lee, 
Applying the Information Bottleneck Principle to Prosodic Representation Learning.
ICASSP2020
Matthew King-Hang Ma, Tan Lee, Manson Cheuk-Man Fong, William Shi-Yuan Wang, 
Resting-State EEG-Based Biometrics with Signals Features Extracted by Multivariate Empirical Mode Decomposition.
ICASSP2020
Zhiyuan Peng, Siyuan Feng 0001, Tan Lee, 
Mixture Factorized Auto-Encoder for Unsupervised Hierarchical Deep Factorization of Speech Signal.
ICASSP2020
Yuzhong Wu, Tan Lee, 
Time-Frequency Feature Decomposition Based on Sound Duration for Acoustic Scene Classification.
Interspeech2020
Jingyu Li, Tan Lee, 
Text-Independent Speaker Verification with Dual Attention Network.
Interspeech2020
Shuiyang Mao, P. C. Ching, C.-C. Jay Kuo, Tan Lee, 
Advancing Multiple Instance Learning with Attention Modeling for Categorical Speech Emotion Recognition.
SpeechComm2022
Mrinmoy Bhattacharjee, S. R. Mahadeva Prasanna, Prithwijit Guha, 
Speech/music classification using phase-based and magnitude-based features.
Interspeech2022
Moakala Tzudir, Priyankoo Sarmah, S. R. Mahadeva Prasanna, 
Prosodic Information in Dialect Identification of a Tonal Language: The case of Ao.
Interspeech2021
Shikha Baghel, Mrinmoy Bhattacharjee, S. R. Mahadeva Prasanna, Prithwijit Guha, 
Automatic Detection of Shouted Speech Segments in Indian News Debates.
Interspeech2021
Moakala Tzudir, Shikha Baghel, Priyankoo Sarmah, S. R. Mahadeva Prasanna, 
Excitation Source Feature Based Dialect Identification in Ao - A Low Resource Language.
SpeechComm2020
Protima Nomo Sudro, S. R. Mahadeva Prasanna, 
Enhancement of cleft palate speech using temporal and spectral processing.
SpeechComm2020
Akhilesh Kumar Dubey, S. R. Mahadeva Prasanna, Samarendra Dandapat, 
Sinusoidal model-based hypernasality detection in cleft palate speech using CVCV sequence.
TASLP2020
Mrinmoy Bhattacharjee, S. R. Mahadeva Prasanna, Prithwijit Guha, 
Speech/Music Classification Using Features From Spectral Peaks.
TASLP2020
Vikram C. Mathad, S. R. Mahadeva Prasanna, 
Vowel Onset Point Based Screening of Misarticulated Stops in Cleft Lip and Palate Speech.
Interspeech2020
Ajish K. Abraham, M. Pushpavathi, N. Sreedevi, A. Navya, Vikram C. Mathad, S. R. Mahadeva Prasanna, 
Spectral Moment and Duration of Burst of Plosives in Speech of Children with Hearing Impairment and Typically Developing Children - A Comparative Study.
Interspeech2020
Ayush Agarwal, Jagabandhu Mishra, S. R. Mahadeva Prasanna, 
VOP Detection in Variable Speech Rate Condition.
TASLP2019
Vikram C. M., Nagaraj Adiga, S. R. Mahadeva Prasanna, 
Detection of Nasalized Voiced Stops in Cleft Palate Speech Using Epoch-Synchronous Features.
ICASSP2019
K. T. Deepak, Pavitra Kulkarni, U. Mudenagudi, S. R. M. Prasanna, 
Glottal Instants Extraction from Speech Signal Using Generative Adversarial Network.
Interspeech2019
Akhilesh Kumar Dubey, S. R. Mahadeva Prasanna, Samarendra Dandapat, 
Hypernasality Severity Detection Using Constant Q Cepstral Coefficients.
Interspeech2019
Sarfaraz Jelil, Abhishek Shrivastava, Rohan Kumar Das, S. R. Mahadeva Prasanna, Rohit Sinha 0003, 
SpeechMarker: A Voice Based Multi-Level Attendance Application.
Interspeech2019
Sishir Kalita, Protima Nomo Sudro, S. R. Mahadeva Prasanna, Samarendra Dandapat, 
Nasal Air Emission in Sibilant Fricatives of Cleft Lip and Palate Speech.
Interspeech2019
Protima Nomo Sudro, S. R. Mahadeva Prasanna, 
Modification of Devoicing Error in Cleft Lip and Palate Speech.
SpeechComm2018
Rajib Sharma, Ramesh K. Bhukya, S. R. M. Prasanna, 
Analysis of the Hilbert Spectrum for Text-Dependent Speaker Verification.
SpeechComm2018
Bidisha Sharma, S. R. Mahadeva Prasanna, 
Significance of sonority information for voiced/unvoiced decision in speech synthesis.
Interspeech2018
Kishalay Chakraborty, Senjam Shantirani Devi, Sanjeevan Devnath, S. R. Mahadeva Prasanna, Priyankoo Sarmah, 
Glotto Vibrato Graph: A Device and Method for Recording, Analysis and Visualization of Glottal Activity.
Interspeech2018
Abhishek Dey, Abhash Deka, Siddika Imani, Barsha Deka, Rohit Sinha 0003, S. R. Mahadeva Prasanna, Priyankoo Sarmah, K. Samudravijaya, S. R. Nirmala, 
AGROASSAM: A Web Based Assamese Speech Recognition Application for Retrieving Agricultural Commodity Price and Weather Information.
SpeechComm2023
Feng Dang, Hangting Chen, Qi Hu, Pengyuan Zhang, Yonghong Yan 0002, 
First coarse, fine afterward: A lightweight two-stage complex approach for monaural speech enhancement.
TASLP2022
Gaofeng Cheng, Haoran Miao, Runyan Yang, Keqi Deng, Yonghong Yan 0002, 
ETEH: Unified Attention-Based End-to-End ASR and KWS Architecture.
TASLP2022
Keqi Deng, Gaofeng Cheng, Runyan Yang, Yonghong Yan 0002, 
Alleviating ASR Long-Tailed Problem by Decoupling the Learning of Representation and Classification.
TASLP2022
Changfeng Gao, Gaofeng Cheng, Ta Li, Pengyuan Zhang, Yonghong Yan 0002, 
Self-Supervised Pre-Training for Attention-Based Encoder-Decoder ASR Model.
Interspeech2022
Yifan Chen, Yifan Guo, Qingxuan Li, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan 0002, 
Interrelate Training and Searching: A Unified Online Clustering Framework for Speaker Diarization.
Interspeech2022
Zehan Li, Haoran Miao, Keqi Deng, Gaofeng Cheng, Sanli Tian, Ta Li, Yonghong Yan 0002, 
Improving Streaming End-to-End ASR on Transformer-based Causal Models with Encoder States Revision Strategies.
Interspeech2022
Yukun Liu, Ta Li, Pengyuan Zhang, Yonghong Yan 0002, 
NAS-SCAE: Searching Compact Attention-based Encoders For End-to-end Automatic Speech Recognition.
Interspeech2022
Sanli Tian, Keqi Deng, Zehan Li, Lingxuan Ye, Gaofeng Cheng, Ta Li, Yonghong Yan 0002, 
Knowledge Distillation For CTC-based Speech Recognition Via Consistent Acoustic Representation Learning.
Interspeech2022
Zehui Yang, Yifan Chen, Lei Luo, Runyan Yang, Lingxuan Ye, Gaofeng Cheng, Ji Xu, Yaohui Jin, Qingqing Zhang, Pengyuan Zhang, Lei Xie, Yonghong Yan 0002, 
Open Source MagicData-RAMC: A Rich Annotated Mandarin Conversational(RAMC) Speech Dataset.
Interspeech2022
Lingxuan Ye, Gaofeng Cheng, Runyan Yang, Zehui Yang, Sanli Tian, Pengyuan Zhang, Yonghong Yan 0002, 
Improving Recognition of Out-of-vocabulary Words in E2E Code-switching ASR by Fusing Speech Generation Methods.
Interspeech2022
Xueshuai Zhang, Jiakun Shen, Jun Zhou, Pengyuan Zhang, Yonghong Yan 0002, Zhihua Huang, Yanfen Tang, Yu Wang, Fujie Zhang, Shaoxing Zhang, Aijun Sun, 
Robust Cough Feature Extraction and Classification Method for COVID-19 Cough Detection Based on Vocalization Characteristics.
Interspeech2022
Han Zhu, Li Wang, Gaofeng Cheng, Jindong Wang 0001, Pengyuan Zhang, Yonghong Yan 0002, 
Wav2vec-S: Semi-Supervised Pre-Training for Low-Resource ASR.
Interspeech2022
Han Zhu, Jindong Wang 0001, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan 0002, 
Decoupled Federated Learning for ASR with Non-IID Data.
SpeechComm2021
Danyang Liu, Ji Xu, Pengyuan Zhang, Yonghong Yan 0002, 
A unified system for multilingual speech recognition and language identification.
TASLP2021
Longbiao Cheng, Xingwei Sun, Dingding Yao, Junfeng Li, Yonghong Yan 0002, 
Estimation Reliability Function Assisted Sound Source Localization With Enhanced Steering Vector Phase Difference.
ICASSP2021
Changfeng Gao, Gaofeng Cheng, Runyan Yang, Han Zhu, Pengyuan Zhang, Yonghong Yan 0002, 
Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Text Data.
Interspeech2021
Jianjun Gu 0005, Longbiao Cheng, Xingwei Sun, Junfeng Li, Yonghong Yan 0002, 
Residual Echo and Noise Cancellation with Feature Attention Module and Multi-Domain Loss Function.
Interspeech2021
Zengqiang Shang, Zhihua Huang, Haozhe Zhang, Pengyuan Zhang, Yonghong Yan 0002, 
Incorporating Cross-Speaker Style Transfer for Multi-Language Text-to-Speech.
Interspeech2021
Haozhe Zhang, Zhihua Huang, Zengqiang Shang, Pengyuan Zhang, Yonghong Yan 0002, 
LinearSpeech: Parallel Text-to-Speech with Linear Complexity.
SpeechComm2020
Fan Yang, Ziteng Wang, Junfeng Li, Risheng Xia, Yonghong Yan 0002, 
Improving generative adversarial networks for speech enhancement through regularization of latent representations.
TASLP2022
Kouhei Sekiguchi, Yoshiaki Bando, Aditya Arie Nugraha, Mathieu Fontaine 0002, Kazuyoshi Yoshii, Tatsuya Kawahara, 
Autoregressive Moving Average Jointly-Diagonalizable Spatial Covariance Analysis for Joint Source Separation and Dereverberation.
ICASSP2022
Sei Ueno, Tatsuya Kawahara, 
Phone-Informed Refinement of Synthesized Mel Spectrogram for Data Augmentation in Speech Recognition.
ICASSP2022
Heran Zhang, Masato Mimura, Tatsuya Kawahara, Kenkichi Ishizuka, 
Selective Multi-Task Learning For Speech Emotion Recognition Using Corpora Of Different Styles.
Interspeech2022
Hayato Futami, Hirofumi Inaguma, Sei Ueno, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara, 
Non-autoregressive Error Correction for CTC-based ASR with Phone-conditioned Masked LM.
Interspeech2022
Soky Kak, Sheng Li 0010, Masato Mimura, Chenhui Chu, Tatsuya Kawahara, 
Leveraging Simultaneous Translation for Enhancing Transcription of Low-resource Language via Cross Attention Mechanism.
Interspeech2022
Seiya Kawano, Muteki Arioka, Akishige Yuguchi, Kenta Yamamoto, Koji Inoue, Tatsuya Kawahara, Satoshi Nakamura 0001, Koichiro Yoshino, 
Multimodal Persuasive Dialogue Corpus using Teleoperated Android.
Interspeech2022
Jumon Nozaki, Tatsuya Kawahara, Kenkichi Ishizuka, Taiichi Hashimoto, 
End-to-end Speech-to-Punctuated-Text Recognition.
Interspeech2022
Hao Shi, Longbiao Wang, Sheng Li 0010, Jianwu Dang 0001, Tatsuya Kawahara, 
Monaural Speech Enhancement Based on Spectrogram Decomposition for Convolutional Neural Network-sensitive Feature Extraction.
ICASSP2021
Hirofumi Inaguma, Yosuke Higuchi, Kevin Duh, Tatsuya Kawahara, Shinji Watanabe 0001, 
ORTHROS: non-autoregressive end-to-end speech translation With dual-decoder.
Interspeech2021
Hirofumi Inaguma, Tatsuya Kawahara, 
StableEmit: Selection Probability Discount for Reducing Emission Latency of Streaming Monotonic Attention ASR.
Interspeech2021
Hirofumi Inaguma, Tatsuya Kawahara, 
VAD-Free Streaming Hybrid CTC/Attention ASR for Unsegmented Recording.
NAACL2021
Hirofumi Inaguma, Tatsuya Kawahara, Shinji Watanabe 0001, 
Source and Target Bidirectional Knowledge Distillation for End-to-end Speech Translation.
TASLP2020
Richeng Duan, Tatsuya Kawahara, Masatake Dantsuji, Hiroaki Nanjo, 
Cross-Lingual Transfer Learning of Non-Native Acoustic Modeling for Pronunciation Error Detection and Diagnosis.
TASLP2020
Kouhei Sekiguchi, Yoshiaki Bando, Aditya Arie Nugraha, Kazuyoshi Yoshii, Tatsuya Kawahara, 
Fast Multichannel Nonnegative Matrix Factorization With Directivity-Aware Jointly-Diagonalizable Spatial Covariance Matrices for Blind Source Separation.
Interspeech2020
Viet-Trung Dang, Tianyu Zhao, Sei Ueno, Hirofumi Inaguma, Tatsuya Kawahara, 
End-to-End Speech-to-Dialog-Act Recognition.
Interspeech2020
Han Feng, Sei Ueno, Tatsuya Kawahara, 
End-to-End Speech Emotion Recognition Combined with Acoustic-to-Word ASR Model.
Interspeech2020
Hayato Futami, Hirofumi Inaguma, Sei Ueno, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara, 
Distilling the Knowledge of BERT for Sequence-to-Sequence ASR.
Interspeech2020
Hirofumi Inaguma, Masato Mimura, Tatsuya Kawahara, 
CTC-Synchronous Training for Monotonic Attention Model.
Interspeech2020
Hirofumi Inaguma, Masato Mimura, Tatsuya Kawahara, 
Enhancing Monotonic Multihead Attention for Streaming ASR.
Interspeech2020
Kohei Matsuura, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara, 
Generative Adversarial Training Data Adaptation for Very Low-Resource Automatic Speech Recognition.
ICASSP2022
Zhehuai Chen, Yu Zhang 0033, Andrew Rosenberg, Bhuvana Ramabhadran, Pedro J. Moreno 0001, Gary Wang, 
Tts4pretrain 2.0: Advancing the use of Text and Speech in ASR Pretraining with Consistency and Contrastive Losses.
ICASSP2022
Neeraj Gaur, Tongzhou Chen, Ehsan Variani, Parisa Haghani, Bhuvana Ramabhadran, Pedro J. Moreno 0001, 
Multilingual Second-Pass Rescoring for Automatic Speech Recognition Systems.
Interspeech2022
Kartik Audhkhasi, Yinghui Huang, Bhuvana Ramabhadran, Pedro J. Moreno 0001, 
Analysis of Self-Attention Head Diversity for Conformer-based Automatic Speech Recognition.
Interspeech2022
Murali Karthick Baskar, Andrew Rosenberg, Bhuvana Ramabhadran, Yu Zhang 0033, Nicolás Serrano, 
Reducing Domain mismatch in Self-supervised speech pre-training.
Interspeech2022
Zhehuai Chen, Yu Zhang 0033, Andrew Rosenberg, Bhuvana Ramabhadran, Pedro J. Moreno 0001, Ankur Bapna, Heiga Zen, 
MAESTRO: Matched Speech Text Representations through Modality Matching.
Interspeech2022
Ehsan Variani, Michael Riley 0001, David Rybach, Cyril Allauzen, Tongzhou Chen, Bhuvana Ramabhadran, 
On Adaptive Weight Interpolation of the Hybrid Autoregressive Transducer.
Interspeech2022
Weiran Wang, Tongzhou Chen, Tara N. Sainath, Ehsan Variani, Rohit Prabhavalkar, W. Ronny Huang, Bhuvana Ramabhadran, Neeraj Gaur, Sepand Mavandadi, Cal Peyser, Trevor Strohman, Yanzhang He, David Rybach, 
Improving Rare Word Recognition with LM-aware MWER Training.
Interspeech2022
Gary Wang, Andrew Rosenberg, Bhuvana Ramabhadran, Fadi Biadsy, Jesse Emond, Yinghui Huang, Pedro J. Moreno 0001, 
Non-Parallel Voice Conversion for ASR Augmentation.
ICASSP2021
Rohan Doshi, Youzheng Chen, Liyang Jiang, Xia Zhang, Fadi Biadsy, Bhuvana Ramabhadran, Fang Chu, Andrew Rosenberg, Pedro J. Moreno 0001, 
Extending Parrotron: An End-to-End, Speech Conversion and Speech Recognition Model for Atypical Speech.
ICASSP2021
Neeraj Gaur, Brian Farris, Parisa Haghani, Isabel Leal, Pedro J. Moreno 0001, Manasa Prasad, Bhuvana Ramabhadran, Yun Zhu, 
Mixture of Informed Experts for Multilingual Speech Recognition.
Interspeech2021
Kartik Audhkhasi, Tongzhou Chen, Bhuvana Ramabhadran, Pedro J. Moreno 0001, 
Mixture Model Attention: Flexible Streaming and Non-Streaming Automatic Speech Recognition.
Interspeech2021
Zhehuai Chen, Bhuvana Ramabhadran, Fadi Biadsy, Xia Zhang, Youzheng Chen, Liyang Jiang, Fang Chu, Rohan Doshi, Pedro J. Moreno 0001, 
Conformer Parrotron: A Faster and Stronger End-to-End Speech Conversion and Recognition Model for Atypical Speech.
Interspeech2021
Zhehuai Chen, Andrew Rosenberg, Yu Zhang 0033, Heiga Zen, Mohammadreza Ghodsi, Yinghui Huang, Jesse Emond, Gary Wang, Bhuvana Ramabhadran, Pedro J. Moreno 0001, 
Semi-Supervision in ASR: Sequential MixMatch and Factorized TTS-Based Augmentation.
Interspeech2021
Isabel Leal, Neeraj Gaur, Parisa Haghani, Brian Farris, Pedro J. Moreno 0001, Manasa Prasad, Bhuvana Ramabhadran, Yun Zhu, 
Self-Adaptive Distillation for Multilingual Speech Recognition: Leveraging Student Independence.
Interspeech2021
Hainan Xu, Kartik Audhkhasi, Yinghui Huang, Jesse Emond, Bhuvana Ramabhadran, 
Regularizing Word Segmentation by Creating Misspellings.
ICASSP2020
Arindrima Datta, Bhuvana Ramabhadran, Jesse Emond, Anjuli Kannan, Brian Roark, 
Language-Agnostic Multilingual Modeling.
ICASSP2020
Gary Wang, Andrew Rosenberg, Zhehuai Chen, Yu Zhang 0033, Bhuvana Ramabhadran, Yonghui Wu, Pedro J. Moreno 0001, 
Improving Speech Recognition Using Consistent Predictions on Synthesized Speech.
Interspeech2020
Zhehuai Chen, Andrew Rosenberg, Yu Zhang 0033, Gary Wang, Bhuvana Ramabhadran, Pedro J. Moreno 0001, 
Improving Speech Recognition Using GAN-Based Speech Synthesis and Contrastive Unspoken Text Selection.
Interspeech2020
Gary Wang, Andrew Rosenberg, Zhehuai Chen, Yu Zhang 0033, Bhuvana Ramabhadran, Pedro J. Moreno 0001, 
SCADA: Stochastic, Consistent and Adversarial Data Augmentation to Improve ASR.
Interspeech2020
Yun Zhu, Parisa Haghani, Anshuman Tripathi, Bhuvana Ramabhadran, Brian Farris, Hainan Xu, Han Lu, Hasim Sak, Isabel Leal, Neeraj Gaur, Pedro J. Moreno 0001, Qian Zhang, 
Multilingual Speech Recognition with Self-Attention Structured Parameterization.
TASLP2022
Anton Ragni, Mark J. F. Gales, Oliver Rose, Katherine Knill, Alexandros Kastanos, Qiujia Li, Preben Ness, 
Increasing Context for Estimating Confidence Scores in Automatic Speech Recognition.
Interspeech2022
Stefano Bannò, Bhanu Balusu, Mark J. F. Gales, Kate Knill, Konstantinos Kyriakopoulos, 
View-Specific Assessment of L2 Spoken English.
ICASSP2021
Yiting Lu, Yu Wang 0027, Mark J. F. Gales, 
Efficient Use of End-to-End Data in Spoken Language Processing.
ICASSP2021
Xizi Wei, Mark J. F. Gales, Kate M. Knill, 
Analysing Bias in Spoken Language Assessment Using Concept Activation Vectors.
Interspeech2021
Qingyun Dou, Xixin Wu, Moquan Wan, Yiting Lu, Mark J. F. Gales, 
Deliberation-Based Multi-Pass Speech Synthesis.
ICASSP2020
Alexandros Kastanos, Anton Ragni, Mark J. F. Gales, 
Confidence Estimation for Black Box Automatic Speech Recognition Systems Using Lattice Recurrent Neural Networks.
Interspeech2020
Qingyun Dou, Joshua Efiong, Mark J. F. Gales, 
Attention Forcing for Speech Synthesis.
Interspeech2020
Kate M. Knill, Linlin Wang, Yu Wang 0027, Xixin Wu, Mark J. F. Gales, 
Non-Native Children's Automatic Speech Recognition: The INTERSPEECH 2020 Shared Task ALTA Systems.
Interspeech2020
Konstantinos Kyriakopoulos, Kate M. Knill, Mark J. F. Gales, 
Automatic Detection of Accent and Lexical Pronunciation Errors in Spontaneous Non-Native English Speech.
Interspeech2020
Yiting Lu, Mark J. F. Gales, Yu Wang 0027, 
Spoken Language 'Grammatical Error Correction'.
Interspeech2020
Potsawee Manakul, Mark J. F. Gales, Linlin Wang, 
Abstractive Spoken Document Summarization Using Hierarchical Model with Multi-Stage Attention Diversity Optimization.
Interspeech2020
Vyas Raina, Mark J. F. Gales, Kate M. Knill, 
Universal Adversarial Attacks on Spoken Language Assessment Systems.
Interspeech2020
Xixin Wu, Kate M. Knill, Mark J. F. Gales, Andrey Malinin, 
Ensemble Approaches for Uncertainty in Spoken Language Assessment.
TASLP2019
Xie Chen 0001, Xunying Liu, Yu Wang 0027, Anton Ragni, Jeremy Heng Meng Wong, Mark J. F. Gales, 
Exploiting Future Word Contexts in Neural Network Language Models for Speech Recognition.
TASLP2019
Jeremy Heng Meng Wong, Mark John Francis Gales, Yu Wang 0027, 
General Sequence Teacher-Student Learning.
ICASSP2019
Kate M. Knill, Mark J. F. Gales, P. P. Manakul, Andrew Caines, 
Automatic Grammatical Error Detection of Non-native Spoken Learner English.
ICASSP2019
Qiujia Li, Preben Ness, Anton Ragni, Mark J. F. Gales, 
Bi-directional Lattice Recurrent Neural Networks for Confidence Estimation.
Interspeech2019
Konstantinos Kyriakopoulos, Kate M. Knill, Mark J. F. Gales, 
A Deep Learning Approach to Automatic Characterisation of Rhythm in Non-Native English Speech.
Interspeech2019
Yiting Lu, Mark J. F. Gales, Kate M. Knill, P. P. Manakul, Linlin Wang, Yu Wang 0027, 
Impact of ASR Performance on Spoken Grammatical Error Detection.
SpeechComm2018
Yu Wang 0027, Mark J. F. Gales, Kate M. Knill, Konstantinos Kyriakopoulos, Andrey Malinin, Rogier C. van Dalen, M. Rashid, 
Towards automatic assessment of spontaneous spoken English.
TASLP2022
Shang-Yi Chuang, Hsin-Min Wang, Yu Tsao 0001, 
Improved Lite Audio-Visual Speech Enhancement.
ICASSP2022
Kuan-Chen Wang, Kai-Chun Liu, Hsin-Min Wang, Yu Tsao 0001, 
EMGSE: Acoustic/EMG Fusion for Multimodal Speech Enhancement.
ICASSP2022
Haibin Wu, Heng-Cheng Kuo, Naijun Zheng, Kuo-Hsuan Hung, Hung-Yi Lee, Yu Tsao 0001, Hsin-Min Wang, Helen Meng, 
Partially Fake Audio Detection by Self-Attention-Based Fake Span Discovery.
Interspeech2022
Wen-Chin Huang, Erica Cooper, Yu Tsao 0001, Hsin-Min Wang, Tomoki Toda, Junichi Yamagishi, 
The VoiceMOS Challenge 2022.
Interspeech2022
Hung-Shin Lee, Pin-Tuan Huang, Yao-Fei Cheng, Hsin-Min Wang, 
Chain-based Discriminative Autoencoders for Speech Recognition.
Interspeech2022
Chi-Chang Lee, Cheng-Hung Hu, Yu-Chen Lin, Chu-Song Chen, Hsin-Min Wang, Yu Tsao 0001, 
NASTAR: Noise Adaptive Speech Enhancement with Target-Conditional Resampling.
Interspeech2022
Fan-Lin Wang, Hung-Shin Lee, Yu Tsao 0001, Hsin-Min Wang, 
Disentangling the Impacts of Language and Channel Variability on Speech Separation Networks.
Interspeech2022
Ryandhimas Edo Zezario, Fei Chen 0011, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao 0001, 
MBI-Net: A Non-Intrusive Multi-Branched Speech Intelligibility Prediction Model for Hearing Aids.
Interspeech2022
Ryandhimas Edo Zezario, Szu-Wei Fu, Fei Chen 0011, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao 0001, 
MTI-Net: A Multi-Target Speech Intelligibility Prediction Model.
ICASSP2021
Wen-Chin Huang, Chia-Hua Wu, Shang-Bao Luo, Kuan-Yu Chen, Hsin-Min Wang, Tomoki Toda, 
Speech Recognition by Simply Fine-Tuning Bert.
Interspeech2021
Yao-Fei Cheng, Hung-Shin Lee, Hsin-Min Wang, 
AlloST: Low-Resource Speech Translation Without Source Transcription.
Interspeech2021
Wen-Chin Huang, Kazuhiro Kobayashi, Yu-Huai Peng, Ching-Feng Liu, Yu Tsao 0001, Hsin-Min Wang, Tomoki Toda, 
A Preliminary Study of a Two-Stage Paradigm for Preserving Speaker Identity in Dysarthric Voice Conversion.
Interspeech2021
Fan-Lin Wang, Yu-Huai Peng, Hung-Shin Lee, Hsin-Min Wang, 
Dual-Path Filter Network: Speaker-Aware Modeling for Speech Separation.
Interspeech2021
Yi-Chiao Wu, Cheng-Hung Hu, Hung-Shin Lee, Yu-Huai Peng, Wen-Chin Huang, Yu Tsao 0001, Hsin-Min Wang, Tomoki Toda, 
Relational Data Selection for Data Augmentation of Speaker-Dependent Multi-Band MelGAN Vocoder.
TASLP2020
Hung-Shin Lee, Yu Tsao 0001, Shyh-Kang Jeng, Hsin-Min Wang, 
Subspace-Based Representation and Learning for Phonotactic Spoken Language Recognition.
TASLP2020
Chang-Le Liu, Sze-Wei Fu, You-Jin Li, Jen-Wei Huang, Hsin-Min Wang, Yu Tsao 0001, 
Multichannel Speech Enhancement by Raw Waveform-Mapping Using Fully Convolutional Networks.
TASLP2020
Cheng Yu, Ryandhimas E. Zezario, Syu-Siang Wang, Jonathan Sherman, Yi-Yen Hsieh, Xugang Lu, Hsin-Min Wang, Yu Tsao 0001, 
Speech Enhancement Based on Denoising Autoencoder With Multi-Branched Encoders.
ICASSP2020
Qian-Bei Hong, Chung-Hsien Wu, Hsin-Min Wang, Chien-Lin Huang, 
Statistics Pooling Time Delay Neural Network Based on X-Vector for Speaker Verification.
ICASSP2020
Qian-Bei Hong, Chung-Hsien Wu, Hsin-Min Wang, Chien-Lin Huang, 
Combining Deep Embeddings of Acoustic and Articulatory Features for Speaker Identification.
ICASSP2020
Ryandhimas E. Zezario, Tassadaq Hussain, Xugang Lu, Hsin-Min Wang, Yu Tsao 0001, 
Self-Supervised Denoising Autoencoder with Linear Regression Decoder for Speech Enhancement.
ICASSP2022
Takafumi Moriya, Takanori Ashihara, Atsushi Ando, Hiroshi Sato, Tomohiro Tanaka, Kohei Matsuura, Ryo Masumura, Marc Delcroix, Takahiro Shinozaki, 
Hybrid RNN-T/Attention-Based Streaming ASR with Triggered Chunkwise Attention and Dual Internal Language Model Integration.
Interspeech2022
Naoki Makishima, Satoshi Suzuki, Atsushi Ando, Ryo Masumura, 
Speaker consistency loss and step-wise optimization for semi-supervised joint training of TTS and ASR using unpaired text data.
Interspeech2022
Ryo Masumura, Yoshihiro Yamazaki, Saki Mizuno, Naoki Makishima, Mana Ihori, Mihiro Uchida, Hiroshi Sato, Tomohiro Tanaka, Akihiko Takashima, Satoshi Suzuki, Shota Orihashi, Takafumi Moriya, Nobukatsu Hojo, Atsushi Ando, 
End-to-End Joint Modeling of Conversation History-Dependent and Independent ASR Systems with Multi-History Training.
Interspeech2022
Wataru Nakata, Tomoki Koriyama, Shinnosuke Takamichi, Yuki Saito, Yusuke Ijima, Ryo Masumura, Hiroshi Saruwatari, 
Predicting VQVAE-based Character Acting Style from Quotation-Annotated Text for Audiobook Speech Synthesis.
Interspeech2022
Fumio Nihei, Ryo Ishii, Yukiko I. Nakano, Kyosuke Nishida, Ryo Masumura, Atsushi Fukayama, Takao Nakamura, 
Dialogue Acts Aided Important Utterance Detection Based on Multiparty and Multimodal Information.
Interspeech2022
Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Takafumi Moriya, Naoki Makishima, Mana Ihori, Tomohiro Tanaka, Ryo Masumura, 
Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations.
Interspeech2022
Akihiko Takashima, Ryo Masumura, Atsushi Ando, Yoshihiro Yamazaki, Mihiro Uchida, Shota Orihashi, 
Interactive Co-Learning with Cross-Modal Transformer for Audio-Visual Emotion Recognition.
Interspeech2022
Tomohiro Tanaka, Ryo Masumura, Hiroshi Sato, Mana Ihori, Kohei Matsuura, Takanori Ashihara, Takafumi Moriya, 
Domain Adversarial Self-Supervised Speech Representation Learning for Improving Unknown Domain Downstream Tasks.
ICASSP2021
Atsushi Ando, Ryo Masumura, Hiroshi Sato, Takafumi Moriya, Takanori Ashihara, Yusuke Ijima, Tomoki Toda, 
Speech Emotion Recognition Based on Listener Adaptive Models.
ICASSP2021
Mana Ihori, Naoki Makishima, Tomohiro Tanaka, Akihiko Takashima, Shota Orihashi, Ryo Masumura, 
MAPGN: Masked Pointer-Generator Network for Sequence-to-Sequence Pre-Training.
ICASSP2021
Naoki Makishima, Mana Ihori, Akihiko Takashima, Tomohiro Tanaka, Shota Orihashi, Ryo Masumura, 
Audio-Visual Speech Separation Using Cross-Modal Correspondence Loss.
ICASSP2021
Ryo Masumura, Naoki Makishima, Mana Ihori, Akihiko Takashima, Tomohiro Tanaka, Shota Orihashi, 
Hierarchical Transformer-Based Large-Context End-To-End ASR with Large-Context Knowledge Distillation.
ICASSP2021
Takafumi Moriya, Takanori Ashihara, Tomohiro Tanaka, Tsubasa Ochiai, Hiroshi Sato, Atsushi Ando, Yusuke Ijima, Ryo Masumura, Yusuke Shinohara, 
Simpleflat: A Simple Whole-Network Pre-Training Approach for RNN Transducer-Based End-to-End Speech Recognition.
Interspeech2021
Mana Ihori, Naoki Makishima, Tomohiro Tanaka, Akihiko Takashima, Shota Orihashi, Ryo Masumura, 
Zero-Shot Joint Modeling of Multiple Spoken-Text-Style Conversion Tasks Using Switching Tokens.
Interspeech2021
Naoki Makishima, Mana Ihori, Tomohiro Tanaka, Akihiko Takashima, Shota Orihashi, Ryo Masumura, 
Enrollment-Less Training for Personalized Voice Activity Detection.
Interspeech2021
Ryo Masumura, Daiki Okamura, Naoki Makishima, Mana Ihori, Akihiko Takashima, Tomohiro Tanaka, Shota Orihashi, 
Unified Autoregressive Modeling for Joint End-to-End Multi-Talker Overlapped Speech Recognition and Speaker Attribute Estimation.
Interspeech2021
Takafumi Moriya, Tomohiro Tanaka, Takanori Ashihara, Tsubasa Ochiai, Hiroshi Sato, Atsushi Ando, Ryo Masumura, Marc Delcroix, Taichi Asami, 
Streaming End-to-End Speech Recognition for Hybrid RNN-T/Attention Architecture.
Interspeech2021
Tomohiro Tanaka, Ryo Masumura, Mana Ihori, Akihiko Takashima, Takafumi Moriya, Takanori Ashihara, Shota Orihashi, Naoki Makishima, 
Cross-Modal Transformer-Based Neural Correction Models for Automatic Speech Recognition.
Interspeech2021
Tomohiro Tanaka, Ryo Masumura, Mana Ihori, Akihiko Takashima, Shota Orihashi, Naoki Makishima, 
End-to-End Rich Transcription-Style Automatic Speech Recognition with Semi-Supervised Learning.
TASLP2020
Atsushi Ando, Ryo Masumura, Hosana Kamiyama, Satoshi Kobashikawa, Yushi Aono, Tomoki Toda, 
Customer Satisfaction Estimation in Contact Center Calls Based on a Hierarchical Multi-Task Model.
Interspeech2022
Cécile Fougeron, Nicolas Audibert, Ina Kodrasi, Parvaneh Janbakhshi, Michaela Pernon, Nathalie Lévêque, Stephanie Borel, Marina Laganaro, Hervé Bourlard, Frédéric Assal, 
Comparison of 5 methods for the evaluation of intelligibility in mild to moderate French dysarthric speech.
Interspeech2022
Selen Hande Kabil, Hervé Bourlard, 
From Undercomplete to Sparse Overcomplete Autoencoders to Improve LF-MMI based Speech Recognition.
ICASSP2021
Deepak Baby, Hervé Bourlard, 
Speech Dereverberation Using Variational Autoencoders.
ICASSP2021
Parvaneh Janbakhshi, Ina Kodrasi, Hervé Bourlard, 
Automatic Dysarthric Speech Detection Exploiting Pairwise Distance-Based Convolutional Neural Networks.
ICASSP2021
Apoorv Vyas, Srikanth R. Madikeri, Hervé Bourlard, 
Lattice-Free Mmi Adaptation of Self-Supervised Pretrained Acoustic Models.
Interspeech2021
Srikanth R. Madikeri, Petr Motlícek, Hervé Bourlard, 
Multitask Adaptation with Lattice-Free MMI for Multi-Genre Speech Recognition of Low Resource Languages.
Interspeech2021
Apoorv Vyas, Srikanth R. Madikeri, Hervé Bourlard, 
Comparing CTC and LFMMI for Out-of-Domain Adaptation of wav2vec 2.0 Acoustic Model.
SpeechComm2020
Pranay Dighe, Afsaneh Asaei, Hervé Bourlard, 
On quantifying the quality of acoustic models in hybrid DNN-HMM ASR.
TASLP2020
Parvaneh Janbakhshi, Ina Kodrasi, Hervé Bourlard, 
Automatic Pathological Speech Intelligibility Assessment Exploiting Subspace-Based Analyses.
TASLP2020
Ina Kodrasi, Hervé Bourlard, 
Spectro-Temporal Sparsity Characterization for Dysarthric Speech Detection.
ICASSP2020
Parvaneh Janbakhshi, Ina Kodrasi, Hervé Bourlard, 
Synthetic Speech References for Automatic Pathological Speech Intelligibility Assessment.
ICASSP2020
Banriskhem K. Khonglah, Srikanth R. Madikeri, Subhadeep Dey, Hervé Bourlard, Petr Motlícek, Jayadev Billa, 
Incremental Semi-Supervised Learning for Multi-Genre Speech Recognition.
Interspeech2020
Ina Kodrasi, Michaela Pernon, Marina Laganaro, Hervé Bourlard, 
Automatic Discrimination of Apraxia of Speech and Dysarthria Using a Minimalistic Set of Handcrafted Features.
Interspeech2020
Srikanth R. Madikeri, Banriskhem K. Khonglah, Sibo Tong, Petr Motlícek, Hervé Bourlard, Daniel Povey, 
Lattice-Free Maximum Mutual Information Training of Multilingual Speech Recognition Systems.
SpeechComm2019
Pranay Dighe, Afsaneh Asaei, Hervé Bourlard, 
Low-rank and sparse subspace modeling of speech for DNN based acoustic modeling.
ICASSP2019
Parvaneh Janbakhshi, Ina Kodrasi, Hervé Bourlard, 
Pathological Speech Intelligibility Assessment Based on the Short-time Objective Intelligibility Measure.
ICASSP2019
Ina Kodrasi, Hervé Bourlard, 
Super-gaussianity of Speech Spectral Coefficients as a Potential Biomarker for Dysarthric Speech Detection.
ICASSP2019
Sibo Tong, Philip N. Garner, Hervé Bourlard, 
An Investigation of Multilingual ASR Using End-to-end LF-MMI.
Interspeech2019
Parvaneh Janbakhshi, Ina Kodrasi, Hervé Bourlard, 
Spectral Subspace Analysis for Automatic Assessment of Pathological Speech Intelligibility.
Interspeech2019
Sibo Tong, Apoorv Vyas, Philip N. Garner, Hervé Bourlard, 
Unbiased Semi-Supervised LF-MMI Training Using Dropout.
ICASSP2022
Jiangyu Han, Yanhua Long, Lukás Burget, Jan Cernocký, 
DPCCN: Densely-Connected Pyramid Complex Convolutional Network for Robust Speech Separation and Extraction.
ICASSP2022
Ladislav Mosner, Oldrich Plchot, Lukás Burget, Jan Honza Cernocký, 
Multisv: Dataset for Far-Field Multi-Channel Speaker Verification.
Interspeech2022
Murali Karthick Baskar, Tim Herzig, Diana Nguyen, Mireia Díez, Tim Polzehl, Lukás Burget, Jan Cernocký, 
Speaker adaptation for Wav2vec2 based dysarthric ASR.
Interspeech2022
Martin Kocour, Katerina Zmolíková, Lucas Ondel, Jan Svec, Marc Delcroix, Tsubasa Ochiai, Lukás Burget, Jan Cernocký, 
Revisiting joint decoding based multi-talker speech recognition with DNN acoustic model.
Interspeech2022
Junyi Peng, Rongzhi Gu, Ladislav Mosner, Oldrich Plchot, Lukás Burget, Jan Cernocký, 
Learnable Sparse Filterbank for Speaker Verification.
Interspeech2022
Themos Stafylakis, Ladislav Mosner, Oldrich Plchot, Johan Rohdin, Anna Silnova, Lukás Burget, Jan Cernocký, 
Training speaker embedding extractors using multi-speaker audio with unknown speaker boundaries.
ICASSP2021
Murali Karthick Baskar, Lukás Burget, Shinji Watanabe 0001, Ramón Fernandez Astudillo, Jan Honza Cernocký, 
Eat: Enhanced ASR-TTS for Self-Supervised Speech Recognition.
ICASSP2021
Hari Krishna Vydana, Martin Karafiát, Katerina Zmolíková, Lukás Burget, Honza Cernocký, 
Jointly Trained Transformers Models for Spoken Language Translation.
ICASSP2021
Bolaji Yusuf, Lucas Ondel, Lukás Burget, Jan Cernocký, Murat Saraçlar, 
A Hierarchical Subspace Model for Language-Attuned Acoustic Unit Discovery.
Interspeech2021
Ekaterina Egorova, Hari Krishna Vydana, Lukás Burget, Jan Cernocký, 
Out-of-Vocabulary Words Detection with Attention and CTC Alignments in an End-to-End ASR System.
Interspeech2021
Martin Kocour, Karel Veselý, Alexander Blatt, Juan Zuluaga-Gomez, Igor Szöke, Jan Cernocký, Dietrich Klakow, Petr Motlícek, 
Boosting of Contextual Information in ASR for Air-Traffic Call-Sign Recognition.
Interspeech2021
Junyi Peng, Xiaoyang Qu, Rongzhi Gu, Jianzong Wang, Jing Xiao 0006, Lukás Burget, Jan Cernocký, 
Effective Phase Encoding for End-To-End Speaker Verification.
Interspeech2021
Junyi Peng, Xiaoyang Qu, Jianzong Wang, Rongzhi Gu, Jing Xiao 0006, Lukás Burget, Jan Cernocký, 
ICSpk: Interpretable Complex Speaker Embedding Extractor from Raw Waveform.
Interspeech2021
Igor Szöke, Santosh Kesiraju, Ondrej Novotný, Martin Kocour, Karel Veselý, Jan Cernocký, 
Detecting English Speech in the Air Traffic Control Voice Communication.
Interspeech2021
Katerina Zmolíková, Marc Delcroix, Desh Raj, Shinji Watanabe 0001, Jan Cernocký, 
Auxiliary Loss Function for Target Speech Extraction and Recognition with Weak Supervision Based on Speaker Characteristics.
TASLP2020
Mireia Díez, Lukás Burget, Federico Landini, Jan Cernocký, 
Analysis of Speaker Diarization Based on Bayesian HMM With Eigenvoice Priors.
ICASSP2020
Shuai Wang 0016, Johan Rohdin, Oldrich Plchot, Lukás Burget, Kai Yu 0004, Jan Cernocký, 
Investigation of Specaugment for Deep Speaker Embedding Learning.
ICASSP2020
Mireia Díez, Lukás Burget, Federico Landini, Shuai Wang 0016, Honza Cernocký, 
Optimizing Bayesian Hmm Based X-Vector Clustering for the Second Dihard Speech Diarization Challenge.
ICASSP2019
Murali Karthick Baskar, Lukás Burget, Shinji Watanabe 0001, Martin Karafiát, Takaaki Hori, Jan Honza Cernocký, 
Promising Accurate Prefix Boosting for Sequence-to-sequence ASR.
Interspeech2019
Murali Karthick Baskar, Shinji Watanabe 0001, Ramón Fernandez Astudillo, Takaaki Hori, Lukás Burget, Jan Cernocký, 
Semi-Supervised Sequence-to-Sequence ASR Using Unpaired Speech and Text.
ICASSP2022
Xuankai Chang, Niko Moritz, Takaaki Hori, Shinji Watanabe 0001, Jonathan Le Roux, 
Extended Graph Temporal Classification for Multi-Speaker End-to-End ASR.
ICASSP2022
Yosuke Higuchi, Niko Moritz, Jonathan Le Roux, Takaaki Hori, 
Advancing Momentum Pseudo-Labeling with Conformer and Initialization Strategy.
ICASSP2022
Niko Moritz, Takaaki Hori, Shinji Watanabe 0001, Jonathan Le Roux, 
Sequence Transduction with Graph-Based Supervision.
Interspeech2022
Chiori Hori, Takaaki Hori, Jonathan Le Roux, 
Low-Latency Online Streaming VideoQA Using Audio-Visual Transformers.
Interspeech2022
Efthymios Tzinis, Gordon Wichern, Aswin Shanmugam Subramanian, Paris Smaragdis, Jonathan Le Roux, 
Heterogeneous Target Speech Separation.
TASLP2021
Zhong-Qiu Wang, Gordon Wichern, Jonathan Le Roux, 
Convolutive Prediction for Monaural Speech Dereverberation and Noisy-Reverberant Speaker Separation.
ICASSP2021
Sameer Khurana, Niko Moritz, Takaaki Hori, Jonathan Le Roux, 
Unsupervised Domain Adaptation for Speech Recognition via Uncertainty Driven Self-Training.
ICASSP2021
Niko Moritz, Takaaki Hori, Jonathan Le Roux, 
Capturing Multi-Resolution Context by Dilated Self-Attention.
ICASSP2021
Niko Moritz, Takaaki Hori, Jonathan Le Roux, 
Semi-Supervised Speech Recognition Via Graph-Based Temporal Classification.
Interspeech2021
Yosuke Higuchi, Niko Moritz, Jonathan Le Roux, Takaaki Hori, 
Momentum Pseudo-Labeling for Semi-Supervised Speech Recognition.
Interspeech2021
Chiori Hori, Takaaki Hori, Jonathan Le Roux, 
Optimizing Latency for Online Video Captioning Using Audio-Visual Transformers.
Interspeech2021
Takaaki Hori, Niko Moritz, Chiori Hori, Jonathan Le Roux, 
Advanced Long-Context End-to-End Speech Recognition Using Context-Expanded Transformers.
Interspeech2021
Niko Moritz, Takaaki Hori, Jonathan Le Roux, 
Dual Causal/Non-Causal Self-Attention for Streaming End-to-End Speech Recognition.
TASLP2020
Fatemeh Pishdadian, Gordon Wichern, Jonathan Le Roux, 
Finding Strength in Weakness: Learning to Separate Sounds With Weak Supervision.
ICASSP2020
Xuankai Chang, Wangyou Zhang, Yanmin Qian, Jonathan Le Roux, Shinji Watanabe 0001, 
End-To-End Multi-Speaker Speech Recognition With Transformer.
ICASSP2020
Niko Moritz, Takaaki Hori, Jonathan Le Roux, 
Streaming Automatic Speech Recognition with the Transformer Model.
ICASSP2020
Fatemeh Pishdadian, Gordon Wichern, Jonathan Le Roux, 
Learning to Separate Sounds from Weakly Labeled Scenes.
ICASSP2020
Leda Sari, Niko Moritz, Takaaki Hori, Jonathan Le Roux, 
Unsupervised Speaker Adaptation Using Attention-Based Speaker Memory for End-to-End ASR.
Interspeech2020
Takaaki Hori, Niko Moritz, Chiori Hori, Jonathan Le Roux, 
Transformer-Based Long-Context End-to-End Speech Recognition.
Interspeech2020
Tejas Jayashankar, Jonathan Le Roux, Pierre Moulin, 
Detecting Audio Attacks on ASR Systems with Dropout Uncertainty.
SpeechComm2022
Takuma Okamoto, Keisuke Matsubara, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai, 
Neural speech-rate conversion with multispeaker WaveNet vocoder.
Interspeech2022
Peng Shen, Xugang Lu, Hisashi Kawai, 
Transducer-based language embedding for spoken language identification.
TASLP2021
Xugang Lu, Peng Shen, Yu Tsao 0001, Hisashi Kawai, 
Coupling a Generative Model With a Discriminative Learning Framework for Speaker Verification.
TASLP2021
Yi-Chiao Wu, Tomoki Hayashi, Takuma Okamoto, Hisashi Kawai, Tomoki Toda, 
Quasi-Periodic Parallel WaveGAN: A Non-Autoregressive Raw Waveform Generative Model With Pitch-Dependent Dilated Convolution Neural Network.
ICASSP2021
Xugang Lu, Peng Shen, Yu Tsao 0001, Hisashi Kawai, 
Unsupervised Neural Adaptation Model Based on Optimal Transport for Spoken Language Identification.
ICASSP2021
Keisuke Matsubara, Takuma Okamoto, Ryoichi Takashima, Tetsuya Takiguchi, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai, 
High-Intelligibility Speech Synthesis for Dysarthric Speakers with LPCNet-Based TTS and CycleVAE-Based VC.
ICASSP2021
Takuma Okamoto, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai, 
Noise Level Limited Sub-Modeling for Diffusion Probabilistic Vocoders.
Interspeech2021
Masakiyo Fujimoto, Hisashi Kawai, 
Noise Robust Acoustic Modeling for Single-Channel Speech Recognition Based on a Stream-Wise Transformer Architecture.
TASLP2020
Peng Shen, Xugang Lu, Sheng Li 0010, Hisashi Kawai, 
Knowledge Distillation-Based Representation Learning for Short-Utterance Spoken Language Identification.
ICASSP2020
Takuma Okamoto, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai, 
Transformer-Based Text-to-Speech with Weighted Forced Attention.
Interspeech2020
Peng Shen, Xugang Lu, Hisashi Kawai, 
Investigation of NICT Submission for Short-Duration Speaker Verification Challenge 2020.
Interspeech2020
Yi-Chiao Wu, Tomoki Hayashi, Takuma Okamoto, Hisashi Kawai, Tomoki Toda, 
Quasi-Periodic Parallel WaveGAN Vocoder: A Non-Autoregressive Pitch-Dependent Dilated Convolution Model for Parametric Speech Generation.
ICASSP2019
Takuma Okamoto, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai, 
Investigations of Real-time Gaussian Fftnet and Parallel Wavenet Neural Vocoders with Simple Acoustic Features.
ICASSP2019
Peng Shen, Xugang Lu, Sheng Li 0010, Hisashi Kawai, 
Interactive Learning of Teacher-student Model for Short Utterance Spoken Language Identification.
Interspeech2019
Sheng Li 0010, Chenchen Ding, Xugang Lu, Peng Shen, Tatsuya Kawahara, Hisashi Kawai, 
End-to-End Articulatory Attribute Modeling for Low-Resource Multilingual Speech Recognition.
Interspeech2019
Masakiyo Fujimoto, Hisashi Kawai, 
One-Pass Single-Channel Noisy Speech Recognition Using a Combination of Noisy and Enhanced Features.
Interspeech2019
Sheng Li 0010, Xugang Lu, Chenchen Ding, Peng Shen, Tatsuya Kawahara, Hisashi Kawai, 
Investigating Radical-Based End-to-End Speech Recognition Systems for Chinese Dialects and Japanese.
Interspeech2019
Sheng Li 0010, Raj Dabre, Xugang Lu, Peng Shen, Tatsuya Kawahara, Hisashi Kawai, 
Improving Transformer-Based Speech Recognition Systems with Compressed Structure and Speech Attributes Augmentation.
Interspeech2019
Chien-Feng Liao, Yu Tsao 0001, Xugang Lu, Hisashi Kawai, 
Incorporating Symbolic Sequential Modeling for Speech Enhancement.
Interspeech2019
Xugang Lu, Peng Shen, Sheng Li 0010, Yu Tsao 0001, Hisashi Kawai, 
Class-Wise Centroid Distance Metric Learning for Acoustic Event Detection.
ICASSP2022
Xuankai Chang, Niko Moritz, Takaaki Hori, Shinji Watanabe 0001, Jonathan Le Roux, 
Extended Graph Temporal Classification for Multi-Speaker End-to-End ASR.
ICASSP2022
Yosuke Higuchi, Niko Moritz, Jonathan Le Roux, Takaaki Hori, 
Advancing Momentum Pseudo-Labeling with Conformer and Initialization Strategy.
ICASSP2022
Niko Moritz, Takaaki Hori, Shinji Watanabe 0001, Jonathan Le Roux, 
Sequence Transduction with Graph-Based Supervision.
Interspeech2022
Chiori Hori, Takaaki Hori, Jonathan Le Roux, 
Low-Latency Online Streaming VideoQA Using Audio-Visual Transformers.
ICASSP2021
Sameer Khurana, Niko Moritz, Takaaki Hori, Jonathan Le Roux, 
Unsupervised Domain Adaptation for Speech Recognition via Uncertainty Driven Self-Training.
ICASSP2021
Niko Moritz, Takaaki Hori, Jonathan Le Roux, 
Capturing Multi-Resolution Context by Dilated Self-Attention.
ICASSP2021
Niko Moritz, Takaaki Hori, Jonathan Le Roux, 
Semi-Supervised Speech Recognition Via Graph-Based Temporal Classification.
Interspeech2021
Yosuke Higuchi, Niko Moritz, Jonathan Le Roux, Takaaki Hori, 
Momentum Pseudo-Labeling for Semi-Supervised Speech Recognition.
Interspeech2021
Chiori Hori, Takaaki Hori, Jonathan Le Roux, 
Optimizing Latency for Online Video Captioning Using Audio-Visual Transformers.
Interspeech2021
Takaaki Hori, Niko Moritz, Chiori Hori, Jonathan Le Roux, 
Advanced Long-Context End-to-End Speech Recognition Using Context-Expanded Transformers.
Interspeech2021
Niko Moritz, Takaaki Hori, Jonathan Le Roux, 
Dual Causal/Non-Causal Self-Attention for Streaming End-to-End Speech Recognition.
TASLP2020
Ruizhi Li, Xiaofei Wang 0007, Sri Harish Mallidi, Shinji Watanabe 0001, Takaaki Hori, Hynek Hermansky, 
Multi-Stream End-to-End Speech Recognition.
ICASSP2020
Niko Moritz, Takaaki Hori, Jonathan Le Roux, 
Streaming Automatic Speech Recognition with the Transformer Model.
ICASSP2020
Leda Sari, Niko Moritz, Takaaki Hori, Jonathan Le Roux, 
Unsupervised Speaker Adaptation Using Attention-Based Speaker Memory for End-to-End ASR.
Interspeech2020
Takaaki Hori, Niko Moritz, Chiori Hori, Jonathan Le Roux, 
Transformer-Based Long-Context End-to-End Speech Recognition.
Interspeech2020
Niko Moritz, Gordon Wichern, Takaaki Hori, Jonathan Le Roux, 
All-in-One Transformer: Unifying Speech Recognition, Audio Tagging, and Event Detection.
ICASSP2019
Murali Karthick Baskar, Lukás Burget, Shinji Watanabe 0001, Martin Karafiát, Takaaki Hori, Jan Honza Cernocký, 
Promising Accurate Prefix Boosting for Sequence-to-sequence ASR.
ICASSP2019
Jaejin Cho, Shinji Watanabe 0001, Takaaki Hori, Murali Karthick Baskar, Hirofumi Inaguma, Jesús Villalba 0001, Najim Dehak, 
Language Model Integration Based on Memory Control for Sequence to Sequence Speech Recognition.
ICASSP2019
Takaaki Hori, Ramón Fernandez Astudillo, Tomoki Hayashi, Yu Zhang 0033, Shinji Watanabe 0001, Jonathan Le Roux, 
Cycle-consistency Training for End-to-end Speech Recognition.
ICASSP2019
Niko Moritz, Takaaki Hori, Jonathan Le Roux, 
Triggered Attention for End-to-end Speech Recognition.
SpeechComm2022
Hongning Zhu, Kong Aik Lee, Haizhou Li 0001, 
Discriminative speaker embedding with serialized multi-layer multi-head attention.
ICASSP2022
Tianchi Liu 0004, Rohan Kumar Das, Kong Aik Lee, Haizhou Li 0001, 
MFA: TDNN with Multi-Scale Frequency-Channel Attention for Text-Independent Speaker Verification with Short Utterances.
ICASSP2022
Ruijie Tao, Kong Aik Lee, Rohan Kumar Das, Ville Hautamäki, Haizhou Li 0001, 
Self-Supervised Speaker Recognition with Loss-Gated Learning.
ICASSP2022
Fan Yu, Shiliang Zhang, Pengcheng Guo, Yihui Fu, Zhihao Du, Siqi Zheng, Weilong Huang, Lei Xie 0001, Zheng-Hua Tan, DeLiang Wang, Yanmin Qian, Kong Aik Lee, Zhijie Yan, Bin Ma 0001, Xin Xu, Hui Bu, 
Summary on the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge.
ICASSP2022
Hanyi Zhang, Longbiao Wang, Kong Aik Lee, Meng Liu, Jianwu Dang 0001, Hui Chen, 
Learning Domain-Invariant Transformation for Speaker Verification.
Interspeech2022
Qiongqiong Wang, Kong Aik Lee, Tianchi Liu 0004, 
Scoring of Large-Margin Embeddings for Speaker Verification: Cosine or PLDA?
ICASSP2021
Hanyi Zhang, Longbiao Wang, Kong Aik Lee, Meng Liu, Jianwu Dang 0001, Hui Chen, 
Meta-Learning for Cross-Channel Speaker Verification.
Interspeech2021
Tomi Kinnunen, Andreas Nautsch, Md. Sahidullah, Nicholas W. D. Evans, Xin Wang 0037, Massimiliano Todisco, Héctor Delgado, Junichi Yamagishi, Kong Aik Lee, 
Visualizing Classifier Adjacency Relations: A Case Study in Speaker Verification and Voice Anti-Spoofing.
Interspeech2021
Yibo Wu, Longbiao Wang, Kong Aik Lee, Meng Liu, Jianwu Dang 0001, 
Joint Feature Enhancement and Speaker Recognition with Multi-Objective Task-Oriented Network.
Interspeech2021
Li Zhang 0084, Qing Wang 0039, Kong Aik Lee, Lei Xie 0001, Haizhou Li 0001, 
Multi-Level Transfer Learning from Near-Field to Far-Field Speaker Verification.
Interspeech2021
Hongning Zhu, Kong Aik Lee, Haizhou Li 0001, 
Serialized Multi-Layer Multi-Head Attention for Neural Speaker Embedding.
TASLP2020
Tomi Kinnunen, Héctor Delgado, Nicholas W. D. Evans, Kong Aik Lee, Ville Vestman, Andreas Nautsch, Massimiliano Todisco, Xin Wang 0037, Md. Sahidullah, Junichi Yamagishi, Douglas A. Reynolds, 
Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification: Fundamentals.
TASLP2020
Ivan Kukanov, Trung Ngo Trong, Ville Hautamäki, Sabato Marco Siniscalchi, Valerio Mario Salerno, Kong Aik Lee, 
Maximal Figure-of-Merit Framework to Detect Multi-Label Phonetic Features for Spoken Language Recognition.
ICASSP2020
Qiongqiong Wang, Koji Okabe, Kong Aik Lee, Takafumi Koshinaka, 
A Generalized Framework for Domain Adaptation of PLDA in Speaker Recognition.
Interspeech2020
Kosuke Akimoto, Seng Pei Liew, Sakiko Mishima, Ryo Mizushima, Kong Aik Lee, 
POCO: A Voice Spoofing and Liveness Detection Corpus Based on Pop Noise.
Interspeech2020
Kong Aik Lee, Koji Okabe, Hitoshi Yamamoto, Qiongqiong Wang, Ling Guo, Takafumi Koshinaka, Jiacen Zhang, Keisuke Ishikawa, Koichi Shinoda, 
NEC-TT Speaker Verification System for SRE'19 CTS Challenge.
Interspeech2020
Alexey Sholokhov, Tomi Kinnunen, Ville Vestman, Kong Aik Lee, 
Extrapolating False Alarm Rates in Automatic Speaker Verification.
Interspeech2020
Hossein Zeinali, Kong Aik Lee, Jahangir Alam, Lukás Burget, 
SdSV Challenge 2020: Large-Scale Evaluation of Short-Duration Speaker Verification.
Interspeech2020
Hanyi Zhang, Longbiao Wang, Yunchun Zhang, Meng Liu, Kong Aik Lee, Jianguo Wei, 
Adversarial Separation Network for Speaker Recognition.
Interspeech2020
Dao Zhou, Longbiao Wang, Kong Aik Lee, Yibo Wu, Meng Liu, Jianwu Dang 0001, Jianguo Wei, 
Dynamic Margin Softmax Loss for Speaker Verification.
SpeechComm2023
Paula Andrea Pérez-Toro, Tomás Arias-Vergara, Philipp Klumpp, Juan Camilo Vásquez-Correa, Maria Schuster, Elmar Nöth, Juan Rafael Orozco-Arroyave, 
Depression assessment in people with Parkinson's disease: The combination of acoustic features and natural language processing.
Interspeech2022
Sebastian Peter Bayerl, Dominik Wagner, Elmar Nöth, Korbinian Riedhammer, 
Detecting Dysfluencies in Stuttering Therapy Using wav2vec 2.0.
Interspeech2022
Christian Bergler, Alexander Barnhill, Dominik Perrin, Manuel Schmitt, Andreas K. Maier, Elmar Nöth, 
ORCA-WHISPER: An Automatic Killer Whale Sound Type Generation Toolkit Using Deep Learning.
Interspeech2022
Teena tom Dieck, Paula Andrea Pérez-Toro, Tomas Arias, Elmar Nöth, Philipp Klumpp, 
Wav2vec behind the Scenes: How end2end Models learn Phonetics.
Interspeech2022
Abner Hernandez, Paula Andrea Pérez-Toro, Elmar Nöth, Juan Rafael Orozco-Arroyave, Andreas K. Maier, Seung Hee Yang, 
Cross-lingual Self-Supervised Speech Representations for Improved Dysarthric Speech Recognition.
Interspeech2022
Paula Andrea Pérez-Toro, Philipp Klumpp, Abner Hernandez, Tomas Arias, Patricia Lillo, Andrea Slachevsky, Adolfo Martín García, Maria Schuster, Andreas K. Maier, Elmar Nöth, Juan Rafael Orozco-Arroyave, 
Alzheimer's Detection from English to Spanish Using Acoustic and Linguistic Embeddings.
Interspeech2022
P. Schäfer, Paula Andrea Pérez-Toro, Philipp Klumpp, Juan Rafael Orozco-Arroyave, Elmar Nöth, Andreas K. Maier, A. Abad, Maria Schuster, Tomás Arias-Vergara, 
CoachLea: an Android Application to Evaluate the Speech Production and Perception of Children with Hearing Loss.
Interspeech2022
Tobias Weise, Philipp Klumpp, Andreas K. Maier, Elmar Nöth, Björn Heismann, Maria Schuster, Seung Hee Yang, 
Disentangled Latent Speech Representation for Automatic Pathological Intelligibility Assessment.
ICASSP2021
Paula Andrea Pérez-Toro, Juan Camilo Vásquez-Correa, Tomas Arias-Vergara, Philipp Klumpp, M. Sierra-Castrillón, M. E. Roldán-López, D. Aguillón, L. Hincapié-Henao, Carlos Andrés Tobón-Quintero, Tobias Bocklet, Maria Schuster, Juan Rafael Orozco-Arroyave, Elmar Nöth, 
Acoustic and Linguistic Analyses to Assess Early-Onset and Genetic Alzheimer's Disease.
ICASSP2021
Juan Camilo Vásquez-Correa, Tomas Arias-Vergara, Philipp Klumpp, Paula Andrea Pérez-Toro, Juan Rafael Orozco-Arroyave, Elmar Nöth, 
End-2-End Modeling of Speech and Gait from Patients with Parkinson's Disease: Comparison Between High Quality Vs. Smartphone Data.
Interspeech2021
Christian Bergler, Manuel Schmitt, Andreas K. Maier, Helena Symonds, Paul Spong, Steven R. Ness, George Tzanetakis, Elmar Nöth, 
ORCA-SLANG: An Automatic Multi-Stage Semi-Supervised Deep Learning Framework for Large-Scale Killer Whale Call Type Identification.
Interspeech2021
Carlos A. Ferrer, Efren Aragón, María E. Hdez-Díaz, Marc S. De Bodt, Roman Cmejla, Marina Englert, Mara Behlau, Elmar Nöth, 
Modeling Dysphonia Severity as a Function of Roughness and Breathiness Ratings in the GRBAS Scale.
Interspeech2021
Philipp Klumpp, Tobias Bocklet, Tomas Arias-Vergara, Juan Camilo Vásquez-Correa, Paula Andrea Pérez-Toro, Sebastian P. Bayerl, Juan Rafael Orozco-Arroyave, Elmar Nöth, 
The Phonetic Footprint of Covid-19?
Interspeech2021
Paula Andrea Pérez-Toro, Sebastian P. Bayerl, Tomas Arias-Vergara, Juan Camilo Vásquez-Correa, Philipp Klumpp, Maria Schuster, Elmar Nöth, Juan Rafael Orozco-Arroyave, Korbinian Riedhammer, 
Influence of the Interviewer on the Automatic Assessment of Alzheimer's Disease in the Context of the ADReSSo Challenge.
Interspeech2021
Juan Camilo Vásquez-Correa, Julian Fritsch, Juan Rafael Orozco-Arroyave, Elmar Nöth, Mathew Magimai-Doss, 
On Modeling Glottal Source Information for Phonation Assessment in Parkinson's Disease.
SpeechComm2020
Juan Camilo Vásquez-Correa, Tomas Arias-Vergara, Maria Schuster, Juan Rafael Orozco-Arroyave, Elmar Nöth, 
Parallel Representation Learning for the Classification of Pathological Speech: Studies on Parkinson's Disease and Cleft Lip and Palate.
ICASSP2020
Juan Camilo Vásquez-Correa, Tobias Bocklet, Juan Rafael Orozco-Arroyave, Elmar Nöth, 
Comparison of User Models Based on GMM-UBM and I-Vectors for Speech, Handwriting, and Gait Assessment of Parkinson's Disease Patients.
Interspeech2020
Christian Bergler, Manuel Schmitt, Andreas Maier 0001, Simeon Smeele, Volker Barth, Elmar Nöth, 
ORCA-CLEAN: A Deep Denoising Toolkit for Killer Whale Communication.
Interspeech2020
Philipp Klumpp, Tomás Arias-Vergara, Juan Camilo Vásquez-Correa, Paula Andrea Pérez-Toro, Florian Hönig, Elmar Nöth, Juan Rafael Orozco-Arroyave, 
Surgical Mask Detection with Deep Recurrent Phonetic Models.
ICASSP2019
Julian Fritsch, Sebastian Wankerl, Elmar Nöth, 
Automatic Diagnosis of Alzheimer's Disease Using Neural Network Language Models.
ICASSP2022
Roshan Sharma, Shruti Palaskar, Alan W. Black, Florian Metze, 
End-to-End Speech Summarization Using Restricted Self-Attention.
Interspeech2022
Juncheng Li 0001, Shuhui Qu, Po-Yao Huang 0001, Florian Metze, 
AudioTagging Done Right: 2nd comparison of deep learning methods for environmental sound classification.
Interspeech2022
Xinjian Li, Florian Metze, David R. Mortensen, Alan W. Black, Shinji Watanabe 0001, 
ASR2K: Speech Recognition for Around 2000 Languages without Audio.
ICASSP2021
Xinjian Li, David R. Mortensen, Florian Metze, Alan W. Black, 
Multilingual Phonetic Dataset for Low Resource Speech Recognition.
Interspeech2021
Siddhant Arora, Alissa Ostapenko, Vijay Viswanathan 0002, Siddharth Dalmia, Florian Metze, Shinji Watanabe 0001, Alan W. Black, 
Rethinking End-to-End Evaluation of Decomposable Tasks: A Case Study on Spoken Language Understanding.
Interspeech2021
Xinjian Li, Juncheng Li 0001, Florian Metze, Alan W. Black, 
Hierarchical Phone Recognition with Compositional Phonetics.
Interspeech2021
Shruti Palaskar, Ruslan Salakhutdinov, Alan W. Black, Florian Metze, 
Multimodal Speech Summarization Through Semantic Concept Learning.
Interspeech2021
Brian Yan, Siddharth Dalmia, David R. Mortensen, Florian Metze, Shinji Watanabe 0001, 
Differentiable Allophone Graphs for Language-Universal Speech Recognition.
TASLP2020
Odette Scharenborg, Lucas Ondel, Shruti Palaskar, Philip Arthur, Francesco Ciannella, Mingxing Du, Elin Larsen, Danny Merkx, Rachid Riad, Liming Wang, Emmanuel Dupoux, Laurent Besacier, Alan W. Black, Mark Hasegawa-Johnson, Florian Metze, Graham Neubig, Sebastian Stüker, Pierre Godard, Markus Müller 0001, 
Speech Technology for Unwritten Languages.
ICASSP2020
Xinjian Li, Siddharth Dalmia, Juncheng Li 0001, Matthew Lee 0012, Patrick Littell, Jiali Yao, Antonios Anastasopoulos, David R. Mortensen, Graham Neubig, Alan W. Black, Florian Metze, 
Universal Phone Recognition with a Multilingual Allophone System.
ICASSP2020
Anirudh Mani, Shruti Palaskar, Nimshi Venkat Meripo, Sandeep Konam, Florian Metze, 
ASR Error Correction and Domain Adaptation Using Machine Translation.
ICASSP2020
Tejas Srinivasan, Ramon Sanabria, Florian Metze, 
Looking Enhances Listening: Recovering Missing Speech Using Images.
Interspeech2020
Mahaveer Jain, Gil Keren, Jay Mahadeokar, Geoffrey Zweig, Florian Metze, Yatharth Saraf, 
Contextual RNN-T for Open Domain ASR.
Interspeech2020
Zimeng Qiu, Yiyuan Li, Xinjian Li, Florian Metze, William M. Campbell, 
Towards Context-Aware End-to-End Code-Switching Speech Recognition.
SpeechComm2019
Okko Räsänen, Shreyas Seshadri, Julien Karadayi, Eric Riebling, John Bunce, Alejandrina Cristià, Florian Metze, Marisa Casillas, Celia Rosemberg, Elika Bergelson, Melanie Soderstrom, 
Automatic word count estimation from daylong child-centered recordings in various language environments using language-independent syllabification of speech.
ICASSP2019
Ozan Caglayan, Ramon Sanabria, Shruti Palaskar, Loïc Barrault, Florian Metze, 
Multimodal Grounding for Sequence-to-sequence Speech Recognition.
ICASSP2019
Siddharth Dalmia, Xinjian Li, Alan W. Black, Florian Metze, 
Phoneme Level Language Models for Sequence Based Low Resource ASR.
Interspeech2019
Suyoun Kim, Siddharth Dalmia, Florian Metze, 
Cross-Attention End-to-End ASR for Two-Party Conversations.
Interspeech2019
Xinjian Li, Siddharth Dalmia, Alan W. Black, Florian Metze, 
Multilingual Speech Recognition with Corpus Relatedness Sampling.
Interspeech2019
Xinjian Li, Zhong Zhou, Siddharth Dalmia, Alan W. Black, Florian Metze, 
SANTLR: Speech Annotation Toolkit for Low Resource Languages.
ICASSP2022
Huang-Cheng Chou, Wei-Cheng Lin, Chi-Chun Lee, Carlos Busso, 
Exploiting Annotators' Typed Description of Emotion Perception to Maximize Utilization of Ratings for Speech Emotion Recognition.
ICASSP2022
Lucas Goncalves, Carlos Busso, 
AuxFormer: Robust Approach to Audiovisual Emotion Recognition.
ICASSP2022
Seong-Gyun Leem, Daniel Fulford, Jukka-Pekka Onnela, David Gard, Carlos Busso, 
Not All Features are Equal: Selection of Robust Features for Speech Emotion Recognition in Noisy Environments.
Interspeech2022
Huang-Cheng Chou, Chi-Chun Lee, Carlos Busso, 
Exploiting Co-occurrence Frequency of Emotions in Perceptual Evaluations To Train A Speech Emotion Classifier.
Interspeech2022
Lucas Goncalves, Carlos Busso, 
Improving Speech Emotion Recognition Using Self-Supervised Learning with Domain-Specific Audiovisual Tasks.
TASLP2021
Kazi Nazmul Haque, Rajib Rana, Jiajun Liu, John H. L. Hansen, Nicholas Cummins, Carlos Busso, Björn W. Schuller, 
Guided Generative Adversarial Neural Network for Representation Learning and Audio Generation Using Fewer Labelled Audio Data.
Interspeech2021
Seong-Gyun Leem, Daniel Fulford, Jukka-Pekka Onnela, David Gard, Carlos Busso, 
Separation of Emotional and Reconstruction Embeddings on Ladder Network to Improve Speech Emotion Recognition Robustness in Noisy Conditions.
Interspeech2021
Jarrod Luckenbaugh, Samuel Abplanalp, Rachel Gonzalez, Daniel Fulford, David Gard, Carlos Busso, 
Voice Activity Detection with Teacher-Student Domain Emulation.
TASLP2020
Srinivas Parthasarathy, Carlos Busso, 
Semi-Supervised Speech Emotion Recognition With Ladder Networks.
ICASSP2020
Kusha Sridhar, Carlos Busso, 
Modeling Uncertainty in Predicting Emotional Attributes from Spontaneous Speech.
Interspeech2020
Wei-Cheng Lin, Carlos Busso, 
An Efficient Temporal Modeling Approach for Speech Emotion Recognition by Mapping Varied Duration Sentences into Fixed Number of Chunks.
Interspeech2020
Luz Martinez-Lucas, Mohammed Abdelwahab 0001, Carlos Busso, 
The MSP-Conversation Corpus.
Interspeech2020
Kusha Sridhar, Carlos Busso, 
Ensemble of Students Taught by Probabilistic Teachers to Improve Speech Emotion Recognition.
SpeechComm2019
Najmeh Sadoughi, Carlos Busso, 
Speech-driven animation with meaningful behaviors.
SpeechComm2019
Fei Tao, Carlos Busso, 
End-to-end audiovisual speech activity detection with bimodal recurrent neural models.
TASLP2019
Reza Lotfian, Carlos Busso, 
Curriculum Learning for Speech Emotion Recognition From Crowdsourced Labels.
ICASSP2019
John B. Harvill, Mohammed Abdel-Wahab 0001, Reza Lotfian, Carlos Busso, 
Retrieving Speech Samples with Similar Emotional Content Using a Triplet Loss Function.
Interspeech2019
Kusha Sridhar, Carlos Busso, 
Speech Emotion Recognition with a Reject Option.
SpeechComm2018
Najmeh Sadoughi, Yang Liu, Carlos Busso, 
Meaningful head movements driven by emotional synthetic speech.
TASLP2018
Mohammed Abdel-Wahab 0001, Carlos Busso, 
Domain Adversarial for Acoustic Emotion Recognition.
Interspeech2022
Jaejin Cho, Raghavendra Pappagari, Piotr Zelasko, Laureano Moro-Velázquez, Jesús Villalba 0001, Najim Dehak, 
Non-contrastive self-supervised learning of utterance-level speech representations.
Interspeech2022
Sonal Joshi, Saurabh Kataria, Yiwen Shao, Piotr Zelasko, Jesús Villalba 0001, Sanjeev Khudanpur, Najim Dehak, 
Defense against Adversarial Attacks on Hybrid Speech Recognition System using Adversarial Fine-tuning with Denoiser.
Interspeech2022
Sonal Joshi, Saurabh Kataria, Jesús Villalba 0001, Najim Dehak, 
AdvEst: Adversarial Perturbation Estimation to Classify and Detect Adversarial Attacks against Speaker Identification.
Interspeech2022
Saurabh Kataria, Jesús Villalba 0001, Laureano Moro-Velázquez, Najim Dehak, 
Joint domain adaptation and speech bandwidth extension using time-domain GANs for speaker verification.
Interspeech2022
Magdalena Rybicka, Jesús Villalba 0001, Najim Dehak, Konrad Kowalczyk, 
End-to-End Neural Speaker Diarization with an Iterative Refinement of Non-Autoregressive Attention-based Attractors.
Interspeech2022
Yiwen Shao, Jesús Villalba 0001, Sonal Joshi, Saurabh Kataria, Sanjeev Khudanpur, Najim Dehak, 
Chunking Defense for Adversarial Attacks on ASR.
ICASSP2021
Nanxin Chen, Piotr Zelasko, Jesús Villalba 0001, Najim Dehak, 
Focus on the Present: A Regularization Method for the ASR Source-Target Attention Layer.
ICASSP2021
Jaejin Cho, Piotr Zelasko, Jesús Villalba 0001, Najim Dehak, 
Improving Reconstruction Loss Based Speaker Embedding in Unsupervised and Semi-Supervised Scenarios.
ICASSP2021
Saurabh Kataria, Jesús Villalba 0001, Najim Dehak, 
Perceptual Loss Based Speech Denoising with an Ensemble of Audio Pattern Recognition and Self-Supervised Models.
ICASSP2021
Raghavendra Pappagari, Jesús Villalba 0001, Piotr Zelasko, Laureano Moro-Velázquez, Najim Dehak, 
CopyPaste: An Augmentation Method for Speech Emotion Recognition.
Interspeech2021
Saurabhchand Bhati, Jesús Villalba 0001, Piotr Zelasko, Laureano Moro-Velázquez, Najim Dehak, 
Segmental Contrastive Predictive Coding for Unsupervised Word Segmentation.
Interspeech2021
Nanxin Chen, Piotr Zelasko, Laureano Moro-Velázquez, Jesús Villalba 0001, Najim Dehak, 
Align-Denoise: Single-Pass Non-Autoregressive Speech Recognition.
Interspeech2021
Saurabh Kataria, Jesús Villalba 0001, Piotr Zelasko, Laureano Moro-Velázquez, Najim Dehak, 
Deep Feature CycleGANs: Speaker Identity Preserving Non-Parallel Microphone-Telephone Domain Adaptation for Speaker Verification.
Interspeech2021
Raghavendra Pappagari, Jaejin Cho, Sonal Joshi, Laureano Moro-Velázquez, Piotr Zelasko, Jesús Villalba 0001, Najim Dehak, 
Automatic Detection and Assessment of Alzheimer Disease Using Speech and Language Technologies in Low-Resource Scenarios.
Interspeech2021
Magdalena Rybicka, Jesús Villalba 0001, Piotr Zelasko, Najim Dehak, Konrad Kowalczyk, 
Spine2Net: SpineNet with Res2Net and Time-Squeeze-and-Excitation Blocks for Speaker Recognition.
Interspeech2021
Jesús Villalba 0001, Sonal Joshi, Piotr Zelasko, Najim Dehak, 
Representation Learning to Classify and Detect Adversarial Attacks Against Speaker and Speech Recognition Systems.
ICASSP2020
Saurabh Kataria, Phani Sankar Nidadavolu, Jesús Villalba 0001, Nanxin Chen, L. Paola García-Perera, Najim Dehak, 
Feature Enhancement with Deep Feature Losses for Speaker Verification.
ICASSP2020
Laureano Moro-Velázquez, Jesús Villalba 0001, Najim Dehak, 
Using X-Vectors to Automatically Detect Parkinson's Disease from Speech.
ICASSP2020
Phani Sankar Nidadavolu, Saurabh Kataria, Jesús Villalba 0001, L. Paola García-Perera, Najim Dehak, 
Unsupervised Feature Enhancement for Speaker Verification.
ICASSP2020
Raghavendra Pappagari, Tianzi Wang, Jesús Villalba 0001, Nanxin Chen, Najim Dehak, 
X-Vectors Meet Emotions: A Study On Dependencies Between Emotion and Speaker Recognition.
SpeechComm2023
Iván López-Espejo, Amin Edraki, Wai-Yip Chan, Zheng-Hua Tan, Jesper Jensen 0001, 
On the deficiency of intelligibility metrics as proxies for subjective intelligibility.
TASLP2022
Poul Hoang, Jan Mark de Haan, Zheng-Hua Tan, Jesper Jensen 0001, 
Multichannel Speech Enhancement With Own Voice-Based Interfering Speech Suppression for Hearing Assistive Devices.
ICASSP2022
Andreas Jonas Fuglsig, Jan Østergaard, Jesper Jensen 0001, Lars Søndergaard Bertelsen, Peter Mariager, Zheng-Hua Tan, 
Joint Far- and Near-End Speech Intelligibility Enhancement Based on the Approximated Speech Intelligibility Index.
ICASSP2022
Fan Yu, Shiliang Zhang, Pengcheng Guo, Yihui Fu, Zhihao Du, Siqi Zheng, Weilong Huang, Lei Xie 0001, Zheng-Hua Tan, DeLiang Wang, Yanmin Qian, Kong Aik Lee, Zhijie Yan, Bin Ma 0001, Xin Xu, Hui Bu, 
Summary on the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge.
Interspeech2022
Claus M. Larsen, Peter Koch 0001, Zheng-Hua Tan, 
Adversarial Multi-Task Deep Learning for Noise-Robust Voice Activity Detection with Low Algorithmic Delay.
TASLP2021
Iván López-Espejo, Zheng-Hua Tan, Jesper Jensen 0001, 
A Novel Loss Function and Training Strategy for Noise-Robust Keyword Spotting.
TASLP2021
Daniel Michelsanti, Zheng-Hua Tan, Shi-Xiong Zhang, Yong Xu 0004, Meng Yu 0003, Dong Yu 0001, Jesper Jensen 0001, 
An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation.
ICASSP2021
Giovanni Morrone, Daniel Michelsanti, Zheng-Hua Tan, Jesper Jensen 0001, 
Audio-Visual Speech Inpainting with Deep Learning.
SpeechComm2020
Daniel Michelsanti, Zheng-Hua Tan, Sigurdur Sigurdsson, Jesper Jensen 0001, 
Deep-learning-based audio-visual speech enhancement in presence of Lombard effect.
TASLP2020
Morten Kolbæk, Zheng-Hua Tan, Søren Holdt Jensen, Jesper Jensen 0001, 
On Loss Functions for Supervised Monaural Time-Domain Speech Enhancement.
TASLP2020
Juan M. Martín-Doñas, Jesper Jensen 0001, Zheng-Hua Tan, Angel M. Gomez, Antonio M. Peinado, 
Online Multichannel Speech Enhancement Based on Recursive EM and DNN-Based Speech Presence Estimation.
ICASSP2020
Poul Hoang, Zheng-Hua Tan, Thomas Lunner, Jan Mark de Haan, Jesper Jensen 0001, 
Maximum Likelihood Estimation of the Interference-Plus-Noise Cross Power Spectral Density Matrix for Own Voice Retrieval.
ICASSP2020
Saeid Samizade, Zheng-Hua Tan, Chao Shen 0001, Xiaohong Guan, 
Adversarial Example Detection by Classification for Deep Speech Recognition.
Interspeech2020
Daniel Michelsanti, Olga Slizovskaia, Gloria Haro, Emilia Gómez, Zheng-Hua Tan, Jesper Jensen 0001, 
Vocoder-Based Speech Synthesis from Silent Videos.
TASLP2019
Morten Kolbaek, Zheng-Hua Tan, Jesper Jensen 0001, 
On the Relationship Between Short-Time Objective Intelligibility and Short-Time Spectral-Amplitude Mean-Square Error for Speech Enhancement.
TASLP2019
Achintya Kumar Sarkar, Zheng-Hua Tan, Hao Tang 0002, Suwon Shon, James R. Glass, 
Time-Contrastive Learning Based Deep Bottleneck Features for Text-Dependent Speaker Verification.
ICASSP2019
Daniel Michelsanti, Zheng-Hua Tan, Sigurdur Sigurdsson, Jesper Jensen 0001, 
On Training Targets and Objective Functions for Deep-learning-based Audio-visual Speech Enhancement.
Interspeech2019
Iván López-Espejo, Zheng-Hua Tan, Jesper Jensen 0001, 
Keyword Spotting for Hearing Assistive Devices Robust to External Speakers.
SpeechComm2018
Renhua Peng, Zheng-Hua Tan, Xiaodong Li 0002, Chengshi Zheng, 
A perceptually motivated LP residual estimator in noisy and reverberant environments.
SpeechComm2018
Asger Heidemann Andersen, Jan Mark de Haan, Zheng-Hua Tan, Jesper Jensen 0001, 
Refinement and validation of the binaural short time objective intelligibility measure for spatially diverse conditions.
ICASSP2022
Yukun Ma, Trung Hieu Nguyen 0001, Bin Ma 0001, 
CPT: Cross-Modal Prefix-Tuning for Speech-To-Text Translation.
ICASSP2022
Fan Yu, Shiliang Zhang, Pengcheng Guo, Yihui Fu, Zhihao Du, Siqi Zheng, Weilong Huang, Lei Xie 0001, Zheng-Hua Tan, DeLiang Wang, Yanmin Qian, Kong Aik Lee, Zhijie Yan, Bin Ma 0001, Xin Xu, Hui Bu, 
Summary on the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge.
ICASSP2022
Shengkui Zhao, Bin Ma 0001, Karn N. Watcharasupat, Woon-Seng Gan, 
FRCRN: Boosting Feature Representation Using Frequency Recurrence for Monaural Speech Enhancement.
ICASSP2021
Shengkui Zhao, Trung Hieu Nguyen 0001, Bin Ma 0001, 
Monaural Speech Enhancement with Complex Convolutional Block Attention Module and Joint Time Frequency Losses.
ICASSP2021
Shengkui Zhao, Hao Wang, Trung Hieu Nguyen 0001, Bin Ma 0001, 
Towards Natural and Controllable Cross-Lingual Voice Conversion Based on Neural TTS Model and Phonetic Posteriorgram.
EMNLP2021
Yingzhu Zhao, Chongjia Ni, Cheung-Chi Leung, Shafiq R. Joty, Eng Siong Chng, Bin Ma 0001, 
A Unified Speaker Adaptation Approach for ASR.
ICASSP2020
Van Tung Pham, Haihua Xu, Yerbolat Khassanov, Zhiping Zeng, Eng Siong Chng, Chongjia Ni, Bin Ma 0001, Haizhou Li 0001, 
Independent Language Modeling Architecture for End-To-End ASR.
Interspeech2020
Yingzhu Zhao, Chongjia Ni, Cheung-Chi Leung, Shafiq R. Joty, Eng Siong Chng, Bin Ma 0001, 
Speech Transformer with Speaker Aware Persistent Memory.
Interspeech2020
Yingzhu Zhao, Chongjia Ni, Cheung-Chi Leung, Shafiq R. Joty, Eng Siong Chng, Bin Ma 0001, 
Universal Speech Transformer.
Interspeech2020
Yingzhu Zhao, Chongjia Ni, Cheung-Chi Leung, Shafiq R. Joty, Eng Siong Chng, Bin Ma 0001, 
Cross Attention with Monotonic Alignment for Speech Transformer.
Interspeech2020
Shengkui Zhao, Trung Hieu Nguyen 0001, Hao Wang, Bin Ma 0001, 
Towards Natural Bilingual and Code-Switched Speech Synthesis Based on Mix of Monolingual Recordings and Cross-Lingual Voice Conversion.
ICASSP2019
Shiliang Zhang, Ming Lei, Bin Ma 0001, Lei Xie 0001, 
Robust Audio-visual Speech Recognition Using Bimodal Dfsmn with Multi-condition Training and Dropout Regularization.
Interspeech2019
Yerbolat Khassanov, Haihua Xu, Van Tung Pham, Zhiping Zeng, Eng Siong Chng, Chongjia Ni, Bin Ma 0001, 
Constrained Output Embeddings for End-to-End Code-Switching Speech Recognition with Only Monolingual Data.
Interspeech2019
Shiliang Zhang, Yuan Liu, Ming Lei, Bin Ma 0001, Lei Xie 0001, 
Towards Language-Universal Mandarin-English Speech Recognition.
Interspeech2019
Shengkui Zhao, Chongjia Ni, Rong Tong, Bin Ma 0001, 
Multi-Task Multi-Network Joint-Learning of Deep Residual Networks and Cycle-Consistency Generative Adversarial Networks for Robust Speech Recognition.
Interspeech2019
Shengkui Zhao, Trung Hieu Nguyen 0001, Hao Wang, Bin Ma 0001, 
Fast Learning for Non-Parallel Many-to-Many Voice Conversion with Residual Star Generative Adversarial Networks.
Interspeech2018
Yougen Yuan, Cheung-Chi Leung, Lei Xie 0001, Hongjie Chen, Bin Ma 0001, Haizhou Li 0001, 
Learning Acoustic Word Embeddings with Temporal Context for Query-by-Example Speech Search.
SpeechComm2017
Chang Huai You, Bin Ma 0001, 
Spectral-domain speech enhancement for speech recognition.
TASLP2017
Hongjie Chen, Lei Xie 0001, Cheung-Chi Leung, Xiaoming Lu, Bin Ma 0001, Haizhou Li 0001, 
Modeling Latent Topics and Temporal Distance for Story Segmentation of Broadcast News.
ICASSP2017
Liping Chen, Kong-Aik Lee, Bin Ma 0001, Long Ma, Haizhou Li 0001, Li-Rong Dai 0001, 
Adaptation of PLDA for multi-source text-independent speaker verification.
ICASSP2022
Hang Su, Danyang Zhao, Long Dang, Minglei Li, Xixin Wu, Xunying Liu, Helen Meng, 
A Multitask Learning Framework for Speaker Change Detection with Content Information from Unsupervised Speech Decomposition.
ICASSP2022
Disong Wang, Songxiang Liu, Xixin Wu, Hui Lu, Lifa Sun, Xunying Liu, Helen Meng, 
Speaker Identity Preservation in Dysarthric Speech Reconstruction by Adversarial Speaker Adaptation.
ICASSP2022
Xixin Wu, Shoukang Hu, Zhiyong Wu 0001, Xunying Liu, Helen Meng, 
Neural Architecture Search for Speech Emotion Recognition.
ICASSP2022
Naijun Zheng, Na Li 0012, Xixin Wu, Lingwei Meng, Jiawen Kang, Haibin Wu, Chao Weng, Dan Su 0002, Helen Meng, 
The CUHK-Tencent Speaker Diarization System for the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Challenge.
Interspeech2022
Jie Chen, Changhe Song, Deyi Tuo, Xixin Wu, Shiyin Kang, Zhiyong Wu 0001, Helen Meng, 
Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information.
Interspeech2022
Haohan Guo, Hui Lu, Xixin Wu, Helen Meng, 
A Multi-Scale Time-Frequency Spectrogram Discriminator for GAN-based Non-Autoregressive TTS.
Interspeech2022
Haohan Guo, Feng-Long Xie, Frank K. Soong, Xixin Wu, Helen Meng, 
A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS.
Interspeech2022
Yi Wang, Tianzi Wang, Zi Ye, Lingwei Meng, Shoukang Hu, Xixin Wu, Xunying Liu, Helen Meng, 
Exploring linguistic feature and model combination for speech recognition based automatic AD detection.
Interspeech2022
Haibin Wu, Lingwei Meng, Jiawen Kang, Jinchao Li, Xu Li, Xixin Wu, Hung-yi Lee, Helen Meng, 
Spoofing-Aware Speaker Verification by Multi-Level Fusion.
TASLP2021
Songxiang Liu, Yuewen Cao, Disong Wang, Xixin Wu, Xunying Liu, Helen Meng, 
Any-to-Many Voice Conversion With Location-Relative Sequence-to-Sequence Modeling.
TASLP2021
Xixin Wu, Yuewen Cao, Hui Lu, Songxiang Liu, Shiyin Kang, Zhiyong Wu 0001, Xunying Liu, Helen Meng, 
Exemplar-Based Emotive Speech Synthesis.
TASLP2021
Xixin Wu, Yuewen Cao, Hui Lu, Songxiang Liu, Disong Wang, Zhiyong Wu 0001, Xunying Liu, Helen Meng, 
Speech Emotion Recognition Using Sequential Capsule Networks.
Interspeech2021
Qingyun Dou, Xixin Wu, Moquan Wan, Yiting Lu, Mark J. F. Gales, 
Deliberation-Based Multi-Pass Speech Synthesis.
Interspeech2021
Xu Li, Xixin Wu, Hui Lu, Xunying Liu, Helen Meng, 
Channel-Wise Gated Res2Net: Towards Robust Detection of Synthetic Speech Attacks.
Interspeech2021
Hui Lu, Zhiyong Wu 0001, Xixin Wu, Xu Li, Shiyin Kang, Xunying Liu, Helen Meng, 
VAENAR-TTS: Variational Auto-Encoder Based Non-AutoRegressive Text-to-Speech Synthesis.
Interspeech2021
Disong Wang, Songxiang Liu, Lifa Sun, Xixin Wu, Xunying Liu, Helen Meng, 
Learning Explicit Prosody Models and Deep Speaker Embeddings for Atypical Voice Conversion.
ICASSP2020
Yuewen Cao, Songxiang Liu, Xixin Wu, Shiyin Kang, Peng Liu, Zhiyong Wu 0001, Xunying Liu, Dan Su 0002, Dong Yu 0001, Helen Meng, 
Code-Switched Speech Synthesis Using Bilingual Phonetic Posteriorgram with Only Monolingual Corpora.
ICASSP2020
Xu Li, Jinghua Zhong, Xixin Wu, Jianwei Yu, Xunying Liu, Helen Meng, 
Adversarial Attacks on GMM I-Vector Based Speaker Verification Systems.
ICASSP2020
Songxiang Liu, Disong Wang, Yuewen Cao, Lifa Sun, Xixin Wu, Shiyin Kang, Zhiyong Wu 0001, Xunying Liu, Dan Su 0002, Dong Yu 0001, Helen Meng, 
End-To-End Accent Conversion Without Using Native Utterances.
ICASSP2020
Disong Wang, Jianwei Yu, Xixin Wu, Songxiang Liu, Lifa Sun, Xunying Liu, Helen Meng, 
End-To-End Voice Conversion Via Cross-Modal Knowledge Distillation for Dysarthric Speech Reconstruction.
ICASSP2022
Shijing Si, Jianzong Wang, Junqing Peng, Jing Xiao 0006, 
Towards Speaker Age Estimation With Label Distribution Learning.
ICASSP2022
Huaizhen Tang, Xulong Zhang 0001, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006, 
Avqvc: One-Shot Voice Conversion By Vector Quantization With Applying Contrastive Learning.
ICASSP2022
Qiqi Wang 0005, Xulong Zhang 0001, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006, 
DRVC: A Framework of Any-to-Any Voice Conversion with Self-Supervised Learning.
ICASSP2022
Tong Ye, Shijing Si, Jianzong Wang, Rui Wang, Ning Cheng 0001, Jing Xiao 0006, 
VU-BERT: A Unified Framework for Visual Dialog.
ICASSP2022
Yong Zhang, Zhitao Li, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006, 
Self-Attention for Incomplete Utterance Rewriting.
ICASSP2022
Chendong Zhao, Jianzong Wang, Xiaoyang Qu, Haoqian Wang, Jing Xiao 0006, 
r-G2P: Evaluating and Enhancing Robustness of Grapheme to Phoneme Conversion by Controlled Noise Introducing and Contextual Information Incorporation.
ICASSP2022
Botao Zhao, Xulong Zhang 0001, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006, 
nnSpeech: Speaker-Guided Conditional Variational Autoencoder for Zero-Shot Multi-speaker text-to-speech.
Interspeech2022
Zuheng Kang, Junqing Peng, Jianzong Wang, Jing Xiao 0006, 
SpeechEQ: Speech Emotion Recognition based on Multi-scale Unified Datasets and Multitask Learning.
Interspeech2022
Jian Luo, Jianzong Wang, Ning Cheng 0001, Edward Xiao, Xulong Zhang 0001, Jing Xiao 0006, 
Tiny-Sepformer: A Tiny Time-Domain Transformer Network For Speech Separation.
Interspeech2022
Chenfeng Miao, Ting Chen, Minchuan Chen, Jun Ma 0018, Shaojun Wang, Jing Xiao 0006, 
A compact transformer-based GAN vocoder.
Interspeech2022
Chenfeng Miao, Kun Zou, Ziyang Zhuang, Tao Wei, Jun Ma 0018, Shaojun Wang, Jing Xiao 0006, 
Towards Efficiently Learning Monotonic Alignments for Attention-based End-to-End Speech Recognition.
Interspeech2022
Ye Wang, Baishun Ling, Yanmeng Wang, Junhao Xue, Shaojun Wang, Jing Xiao 0006, 
Adversarial Knowledge Distillation For Robust Spoken Language Understanding.
Interspeech2022
Tong Ye, Shijing Si, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006, 
Uncertainty Calibration for Deep Audio Classifiers.
ICASSP2021
Yanfei Hui, Jianzong Wang, Ning Cheng 0001, Fengying Yu, Tianbo Wu, Jing Xiao 0006, 
Joint Intent Detection and Slot Filling Based on Continual Learning Model.
ICASSP2021
Shuang Liang, Chenfeng Miao, Minchuan Chen, Jun Ma 0018, Shaojun Wang, Jing Xiao 0006, 
Unsupervised Learning for Multi-Style Speech Synthesis with Limited Data.
ICASSP2021
Jian Luo, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006, 
Unidirectional Memory-Self-Attention Transducer for Online Speech Recognition.
ICASSP2021
Hao Pan, Zhongdi Chao, Jiang Qian, Bojin Zhuang, Shaojun Wang, Jing Xiao 0006, 
Network Pruning Using Linear Dependency Analysis on Feature Maps.
ICASSP2021
Zhen Zeng, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006, 
LVCNet: Efficient Condition-Dependent Modeling Network for Waveform Generation.
Interspeech2021
Wei Chu, Peng Chang 0002, Jing Xiao 0006, 
Extending Pronunciation Dictionary with Automatically Detected Word Mispronunciations to Improve PAII's System for Interspeech 2021 Non-Native Child English Close Track ASR Challenge.
Interspeech2021
Ruchao Fan, Wei Chu, Peng Chang 0002, Jing Xiao 0006, Abeer Alwan, 
An Improved Single Step Non-Autoregressive Transformer for Automatic Speech Recognition.
TASLP2022
Dino Oglic, Zoran Cvetkovic, Peter Sollich, Steve Renals, Bin Yu 0001, 
Towards Robust Waveform-Based Acoustic Models.
Interspeech2022
Chau Luu, Steve Renals, Peter Bell 0001, 
Investigating the contribution of speaker attributes to speaker separability using disentangled speaker representations.
SpeechComm2021
Manuel Sam Ribeiro, Joanne Cleland, Aciel Eshky, Korin Richmond, Steve Renals, 
Exploiting ultrasound tongue imaging for the automatic detection of speech articulation errors.
SpeechComm2021
Aciel Eshky, Joanne Cleland, Manuel Sam Ribeiro, Eleanor Sugden, Korin Richmond, Steve Renals, 
Automatic audiovisual synchronisation for ultrasound tongue imaging.
ICASSP2021
Erfan Loweimi, Zoran Cvetkovic, Peter Bell 0001, Steve Renals, 
Speech Acoustic Modelling from Raw Phase Spectrum.
ICASSP2021
Shucong Zhang, Cong-Thanh Do, Rama Doddipatla, Erfan Loweimi, Peter Bell 0001, Steve Renals, 
Train Your Classifier First: Cascade Neural Networks Training from Upper Layers to Lower Layers.
Interspeech2021
Erfan Loweimi, Zoran Cvetkovic, Peter Bell 0001, Steve Renals, 
Speech Acoustic Modelling Using Raw Source and Filter Components.
Interspeech2021
Chau Luu, Peter Bell 0001, Steve Renals, 
Leveraging Speaker Attribute Information Using Multi Task Learning for Speaker Verification and Diarization.
Interspeech2021
Manuel Sam Ribeiro, Aciel Eshky, Korin Richmond, Steve Renals, 
Silent versus Modal Multi-Speaker Speech Recognition from Ultrasound and Video.
Interspeech2021
Shucong Zhang, Erfan Loweimi, Peter Bell 0001, Steve Renals, 
Stochastic Attention Head Removal: A Simple and Effective Method for Improving Transformer Based ASR Models.
ICASSP2020
Alberto Abad, Peter Bell 0001, Andrea Carmantini, Steve Renals, 
Cross Lingual Transfer Learning for Zero-Resource Domain Adaptation.
ICASSP2020
Chau Luu, Peter Bell 0001, Steve Renals, 
Channel Adversarial Training for Speaker Verification and Diarization.
ICASSP2020
Joanna Rownicka, Peter Bell 0001, Steve Renals, 
Multi-Scale Octave Convolutions for Robust Speech Recognition.
ICASSP2020
Shucong Zhang, Cong-Thanh Do, Rama Doddipatla, Steve Renals, 
Learning Noise Invariant Features Through Transfer Learning For Robust End-to-End Speech Recognition.
Interspeech2020
Ahmed Ali 0002, Steve Renals, 
Word Error Rate Estimation Without ASR Output: e-WER2.
Interspeech2020
Neethu M. Joy, Dino Oglic, Zoran Cvetkovic, Peter Bell 0001, Steve Renals, 
Deep Scattering Power Spectrum Features for Robust Speech Recognition.
Interspeech2020
Erfan Loweimi, Peter Bell 0001, Steve Renals, 
On the Robustness and Training Dynamics of Raw Waveform Models.
Interspeech2020
Erfan Loweimi, Peter Bell 0001, Steve Renals, 
Raw Sign and Magnitude Spectra for Multi-Head Acoustic Modelling.
Interspeech2020
Dino Oglic, Zoran Cvetkovic, Peter Bell 0001, Steve Renals, 
A Deep 2D Convolutional Network for Waveform-Based Speech Recognition.
ICASSP2019
Shucong Zhang, Erfan Loweimi, Peter Bell 0001, Steve Renals, 
Windowed Attention Mechanisms for Speech Recognition.
Interspeech2022
Jason Fong, Daniel Lyth, Gustav Eje Henter, Hao Tang, Simon King, 
Speech Audio Corrector: using speech from non-target speakers for one-off correction of mispronunciations in grapheme-input text-to-speech.
Interspeech2022
Sébastien Le Maguer, Simon King, Naomi Harte, 
Back to the Future: Extending the Blizzard Challenge 2013.
Interspeech2022
Johannah O'Mahony, Catherine Lai, Simon King, 
Combining conversational speech with read speech to improve prosody in Text-to-Speech synthesis.
Interspeech2021
Devang S. Ram Mohan, Qinmin Vivian Hu, Tian Huey Teh, Alexandra Torresquintero, Christopher G. R. Wallis, Marlene Staib, Lorenzo Foglianti, Jiameng Gao, Simon King, 
Ctrl-P: Temporal Control of Prosodic Variation for Speech Synthesis.
Interspeech2021
Alexandra Torresquintero, Tian Huey Teh, Christopher G. R. Wallis, Marlene Staib, Devang S. Ram Mohan, Vivian Hu, Lorenzo Foglianti, Jiameng Gao, Simon King, 
ADEPT: A Dataset for Evaluating Prosody Transfer.
Interspeech2021
Cassia Valentini-Botinhao, Simon King, 
Detection and Analysis of Attention Errors in Sequence-to-Sequence Text-to-Speech.
TASLP2020
Xin Wang 0037, Shinji Takaki, Junichi Yamagishi, Simon King, Keiichi Tokuda, 
A Vector Quantized Variational Autoencoder (VQ-VAE) Autoregressive Neural F0 Model for Statistical Parametric Speech Synthesis.
ICASSP2020
Ivan Himawan, Sandesh Aryal, Iris Ouyang, Sam Kang, Pierre Lanchantin, Simon King, 
Speaker Adaptation of a Multilingual Acoustic Model for Cross-Language Synthesis.
Interspeech2020
Carol Chermaz, Simon King, 
A Sound Engineering Approach to Near End Listening Enhancement.
Interspeech2020
Jason Fong, Jason Taylor, Simon King, 
Testing the Limits of Representation Mixing for Pronunciation Correction in End-to-End Speech Synthesis.
Interspeech2020
Pilar Oplustil Gallegos, Jennifer Williams 0001, Joanna Rownicka, Simon King, 
An Unsupervised Method to Select a Speaker Subset from Large Multi-Speaker Speech Synthesis Datasets.
Interspeech2020
Jacob J. Webber, Olivier Perrotin, Simon King, 
Hider-Finder-Combiner: An Adversarial Architecture for General Speech Signal Modification.
ICASSP2019
Cheng-I Lai, Alberto Abad, Korin Richmond, Junichi Yamagishi, Najim Dehak, Simon King, 
Attentive Filtering Networks for Audio Replay Attack Detection.
ICASSP2019
Oliver Watts, Cassia Valentini-Botinhao, Simon King, 
Speech Waveform Reconstruction Using Convolutional Neural Networks with Noise and Periodic Inputs.
Interspeech2019
Adèle Aubin, Alessandra Cervone, Oliver Watts, Simon King, 
Improving Speech Synthesis with Discourse Relations.
Interspeech2019
Carol Chermaz, Cassia Valentini-Botinhao, Henning F. Schepker, Simon King, 
Evaluating Near End Listening Enhancement Algorithms in Realistic Environments.
Interspeech2019
Jason Fong, Pilar Oplustil Gallegos, Zack Hodari, Simon King, 
Investigating the Robustness of Sequence-to-Sequence Text-to-Speech Models to Imperfectly-Transcribed Training Data.
Interspeech2019
Avashna Govender, Anita E. Wagner, Simon King, 
Using Pupil Dilation to Measure Cognitive Load When Listening to Text-to-Speech in Quiet and in Noise.
Interspeech2019
Jennifer Williams 0001, Simon King, 
Disentangling Style Factors from Speaker Representations.
Interspeech2018
Avashna Govender, Simon King, 
Using Pupillometry to Measure the Cognitive Load of Synthetic Speech.
ICASSP2022
Siddhant Arora, Siddharth Dalmia, Pavel Denisov, Xuankai Chang, Yushi Ueda, Yifan Peng, Yuekai Zhang, Sujay Kumar, Karthik Ganesan 0003, Brian Yan, Ngoc Thang Vu, Alan W. Black, Shinji Watanabe 0001, 
ESPnet-SLU: Advancing Spoken Language Understanding Through ESPnet.
ICASSP2022
Roshan Sharma, Shruti Palaskar, Alan W. Black, Florian Metze, 
End-to-End Speech Summarization Using Restricted Self-Attention.
Interspeech2022
Siddhant Arora, Siddharth Dalmia, Xuankai Chang, Brian Yan, Alan W. Black, Shinji Watanabe 0001, 
Two-Pass Low Latency End-to-End Spoken Language Understanding.
Interspeech2022
Xinjian Li, Florian Metze, David R. Mortensen, Alan W. Black, Shinji Watanabe 0001, 
ASR2K: Speech Recognition for Around 2000 Languages without Audio.
Interspeech2022
Jiachen Lian, Alan W. Black, Louis Goldstein, Gopala Krishna Anumanchipalli, 
Deep Neural Convolutive Matrix Factorization for Articulatory Representation Decomposition.
Interspeech2022
Perez Ogayo, Graham Neubig, Alan W. Black, 
Building African Voices.
Interspeech2022
Peter Wu, Shinji Watanabe 0001, Louis Goldstein, Alan W. Black, Gopala Krishna Anumanchipalli, 
Deep Speech Synthesis from Articulatory Representations.
Interspeech2022
Hemant Yadav, Akshat Gupta, Sai Krishna Rallabandi, Alan W. Black, Rajiv Ratn Shah, 
Intent classification using pre-trained language agnostic embeddings for low resource languages.
ICASSP2021
Akshat Gupta, Xinjian Li, Sai Krishna Rallabandi, Alan W. Black, 
Acoustics Based Intent Recognition Using Discovered Phonetic Units for Low Resource Languages.
ICASSP2021
Xinjian Li, David R. Mortensen, Florian Metze, Alan W. Black, 
Multilingual Phonetic Dataset for Low Resource Speech Recognition.
Interspeech2021
Siddhant Arora, Alissa Ostapenko, Vijay Viswanathan 0002, Siddharth Dalmia, Florian Metze, Shinji Watanabe 0001, Alan W. Black, 
Rethinking End-to-End Evaluation of Decomposable Tasks: A Case Study on Spoken Language Understanding.
Interspeech2021
Xinjian Li, Juncheng Li 0001, Florian Metze, Alan W. Black, 
Hierarchical Phone Recognition with Compositional Phonetics.
Interspeech2021
Shruti Palaskar, Ruslan Salakhutdinov, Alan W. Black, Florian Metze, 
Multimodal Speech Summarization Through Semantic Concept Learning.
TASLP2020
Odette Scharenborg, Lucas Ondel, Shruti Palaskar, Philip Arthur, Francesco Ciannella, Mingxing Du, Elin Larsen, Danny Merkx, Rachid Riad, Liming Wang, Emmanuel Dupoux, Laurent Besacier, Alan W. Black, Mark Hasegawa-Johnson, Florian Metze, Graham Neubig, Sebastian Stüker, Pierre Godard, Markus Müller 0001, 
Speech Technology for Unwritten Languages.
ICASSP2020
Xinjian Li, Siddharth Dalmia, Juncheng Li 0001, Matthew Lee 0012, Patrick Littell, Jiali Yao, Antonios Anastasopoulos, David R. Mortensen, Graham Neubig, Alan W. Black, Florian Metze, 
Universal Phone Recognition with a Multilingual Allophone System.
Interspeech2020
Khyathi Raghavi Chandu, Alan W. Black, 
Style Variation as a Vantage Point for Code-Switching.
Interspeech2020
Amrith Setlur, Barnabás Póczos, Alan W. Black, 
Nonlinear ISA with Auxiliary Variables for Learning Speech Representations.
ACL2020
Elizabeth Salesky, Alan W. Black, 
Phone Features Improve Speech Translation.
ICASSP2019
Alan W. Black, 
CMU Wilderness Multilingual Speech Dataset.
ICASSP2019
Siddharth Dalmia, Xinjian Li, Alan W. Black, Florian Metze, 
Phoneme Level Language Models for Sequence Based Low Resource ASR.
TASLP2022
Juliano G. C. Ribeiro, Natsuki Ueno, Shoichi Koyama, Hiroshi Saruwatari, 
Region-to-Region Kernel Interpolation of Acoustic Transfer Functions Constrained by Physical Properties.
Interspeech2022
Wataru Nakata, Tomoki Koriyama, Shinnosuke Takamichi, Yuki Saito, Yusuke Ijima, Ryo Masumura, Hiroshi Saruwatari, 
Predicting VQVAE-based Character Acting Style from Quotation-Annotated Text for Audiobook Speech Synthesis.
Interspeech2022
Yuto Nishimura, Yuki Saito, Shinnosuke Takamichi, Kentaro Tachibana, Hiroshi Saruwatari, 
Acoustic Modeling for End-to-End Empathetic Dialogue Speech Synthesis Using Linguistic and Prosodic Contexts of Dialogue History.
Interspeech2022
Takaaki Saeki, Shinnosuke Takamichi, Tomohiko Nakamura, Naoko Tanji, Hiroshi Saruwatari, 
SelfRemaster: Self-Supervised Speech Restoration with Analysis-by-Synthesis Approach Using Channel Modeling.
Interspeech2022
Takaaki Saeki, Detai Xin, Wataru Nakata, Tomoki Koriyama, Shinnosuke Takamichi, Hiroshi Saruwatari, 
UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022.
Interspeech2022
Yuki Saito, Yuto Nishimura, Shinnosuke Takamichi, Kentaro Tachibana, Hiroshi Saruwatari, 
STUDIES: Corpus of Japanese Empathetic Dialogue Speech Towards Friendly Voice Agent.
Interspeech2022
Shinnosuke Takamichi, Wataru Nakata, Naoko Tanji, Hiroshi Saruwatari, 
J-MAC: Japanese multi-speaker audiobook corpus for speech synthesis.
Interspeech2022
Kenta Udagawa, Yuki Saito, Hiroshi Saruwatari, 
Human-in-the-loop Speaker Adaptation for DNN-based Multi-speaker TTS.
SpeechComm2021
Masashi Aso, Shinnosuke Takamichi, Norihiro Takamune, Hiroshi Saruwatari, 
Acoustic model-based subword tokenization and prosodic-context extraction without language knowledge for text-to-speech synthesis.
SpeechComm2021
Kentaro Mitsui, Tomoki Koriyama, Hiroshi Saruwatari, 
Deep Gaussian process based multi-speaker speech synthesis with latent speaker representation.
TASLP2021
Shoichi Koyama, Jesper Brunnström, Hayato Ito, Natsuki Ueno, Hiroshi Saruwatari, 
Spatial Active Noise Control Based on Kernel Interpolation of Sound Field.
TASLP2021
Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari, 
Perceptual-Similarity-Aware Deep Speaker Representation Learning for Multi-Speaker Generative Modeling.
ICASSP2021
Yuto Kondo, Yuki Kubo, Norihiro Takamune, Daichi Kitamura, Hiroshi Saruwatari, 
Deficient Basis Estimation of Noise Spatial Covariance Matrix for Rank-Constrained Spatial Covariance Matrix Estimation Method in Blind Speech Extraction.
ICASSP2021
Shoichi Koyama, Takashi Amakasu, Natsuki Ueno, Hiroshi Saruwatari, 
Amplitude Matching: Majorization-Minimization Algorithm for Sound Field Control Only with Amplitude Constraint.
ICASSP2021
Detai Xin, Tatsuya Komatsu, Shinnosuke Takamichi, Hiroshi Saruwatari, 
Disentangled Speaker and Language Representations Using Mutual Information Minimization and Domain Adaptation for Cross-Lingual TTS.
Interspeech2021
Kazuki Mizuta, Tomoki Koriyama, Hiroshi Saruwatari, 
Harmonic WaveGAN: GAN-Based Speech Waveform Generation Model with Harmonic Structure Discriminator.
Interspeech2021
Taiki Nakamura, Tomoki Koriyama, Hiroshi Saruwatari, 
Sequence-to-Sequence Learning for Deep Gaussian Process Based Speech Synthesis Using Self-Attention GP Layer.
Interspeech2021
Detai Xin, Yuki Saito, Shinnosuke Takamichi, Tomoki Koriyama, Hiroshi Saruwatari, 
Cross-Lingual Speaker Adaptation Using Domain Adaptation and Speaker Consistency Loss for Text-To-Speech Synthesis.
TASLP2020
Yuki Kubo, Norihiro Takamune, Daichi Kitamura, Hiroshi Saruwatari, 
Blind Speech Extraction Based on Rank-Constrained Spatial Covariance Matrix Estimation With Multivariate Generalized Gaussian Distribution.
ICASSP2020
Hayato Ito, Shoichi Koyama, Natsuki Ueno, Hiroshi Saruwatari, 
Spatial Active Noise Control Based on Kernel Interpolation with Directional Weighting.
ICASSP2022
Huang-Cheng Chou, Wei-Cheng Lin, Chi-Chun Lee, Carlos Busso, 
Exploiting Annotators' Typed Description of Emotion Perception to Maximize Utilization of Ratings for Speech Emotion Recognition.
ICASSP2022
Ya-Tse Wu, Jeng-Lin Li, Chi-Chun Lee, 
An Audio-Saliency Masking Transformer for Audio Emotion Classification in Movies.
Interspeech2022
Chun-Yu Chen, Yun-Shao Lin, Chi-Chun Lee, 
Emotion-Shift Aware CRF for Decoding Emotion Sequence in Conversation.
Interspeech2022
Huang-Cheng Chou, Chi-Chun Lee, Carlos Busso, 
Exploiting Co-occurrence Frequency of Emotions in Perceptual Evaluations To Train A Speech Emotion Classifier.
Interspeech2022
Yu-Lin Huang, Bo-Hao Su, Y.-W. Peter Hong, Chi-Chun Lee, 
An Attention-Based Method for Guiding Attribute-Aligned Speech Representation Learning.
Interspeech2022
Bo-Hao Su, Chi-Chun Lee, 
Vaccinating SER to Neutralize Adversarial Attacks with Self-Supervised Augmentation Strategy.
Interspeech2021
Yu-Lin Huang, Bo-Hao Su, Y.-W. Peter Hong, Chi-Chun Lee, 
An Attribute-Aligned Strategy for Learning Speech Representation.
ICASSP2020
Ya-Lin Huang, Wan-Ting Hsieh, Hao-Chun Yang, Chi-Chun Lee, 
Conditional Domain Adversarial Transfer for Robust Cross-Site ADHD Classification Using Functional MRI.
ICASSP2020
Yun-Shao Lin, Chi-Chun Lee, 
Predicting Performance Outcome with a Conversational Graph Convolutional Network for Small Group Interactions.
ICASSP2020
Hao-Chun Yang, Chi-Chun Lee, 
A Siamese Content-Attentive Graph Convolutional Network for Personality Recognition Using Physiology.
ICASSP2020
Sung-Lin Yeh, Yun-Shao Lin, Chi-Chun Lee, 
A Dialogical Emotion Decoder for Speech Motion Recognition in Spoken Dialog.
Interspeech2020
Huang-Cheng Chou, Chi-Chun Lee, 
Learning to Recognize Per-Rater's Emotion Perception Using Co-Rater Training Strategy with Soft and Hard Labels.
Interspeech2020
Jeng-Lin Li, Chi-Chun Lee, 
Using Speaker-Aligned Graph Memory Block in Multimodally Attentive Emotion Recognition Network.
Interspeech2020
Bo-Hao Su, Chun-Min Chang, Yun-Shao Lin, Chi-Chun Lee, 
Improving Speech Emotion Recognition Using Graph Attentive Bi-Directional Gated Recurrent Unit Network.
Interspeech2020
Shreya G. Upadhyay, Bo-Hao Su, Chi-Chun Lee, 
Attentive Convolutional Recurrent Neural Network Using Phoneme-Level Acoustic Representation for Rare Sound Event Detection.
Interspeech2020
Sung-Lin Yeh, Yun-Shao Lin, Chi-Chun Lee, 
Speech Representation Learning for Emotion Recognition Using End-to-End ASR with Factorized Adaptation.
Interspeech2020
Shun-Chang Zhong, Bo-Hao Su, Wei Huang, Yi-Ching Liu, Chi-Chun Lee, 
Predicting Collaborative Task Performance Using Graph Interlocutor Acoustic Network in Small Group Interaction.
ICASSP2019
Chun-Min Chang, Chi-Chun Lee, 
Adversarially-enriched Acoustic Code Vector Learned from Out-of-context Affective Corpus for Robust Emotion Recognition.
ICASSP2019
Huang-Cheng Chou, Chi-Chun Lee, 
Every Rating Matters: Joint Learning of Subjective Labels and Individual Annotators for Speech Emotion Classification.
ICASSP2019
Sung-Lin Yeh, Yun-Shao Lin, Chi-Chun Lee, 
An Interaction-aware Attention Network for Speech Emotion Recognition in Spoken Dialogs.
ICASSP2022
Zvi Kons, Aharon Satt, Hong-Kwang Kuo, Samuel Thomas 0001, Boaz Carmeli, Ron Hoory, Brian Kingsbury, 
A New Data Augmentation Method for Intent Classification Enhancement and its Application on Spoken Conversation Datasets.
ICASSP2022
Hong-Kwang Jeff Kuo, Zoltán Tüske, Samuel Thomas 0001, Brian Kingsbury, George Saon, 
Improving End-to-end Models for Set Prediction in Spoken Language Understanding.
ICASSP2022
Vishal Sunder, Samuel Thomas 0001, Hong-Kwang Jeff Kuo, Jatin Ganhotra, Brian Kingsbury, Eric Fosler-Lussier, 
Towards End-to-End Integration of Dialog History for Improved Spoken Language Understanding.
ICASSP2022
Samuel Thomas 0001, Hong-Kwang Jeff Kuo, Brian Kingsbury, George Saon, 
Towards Reducing the Need for Speech Training Data to Build Spoken Language Understanding Systems.
ICASSP2022
Samuel Thomas 0001, Brian Kingsbury, George Saon, Hong-Kwang Jeff Kuo, 
Integrating Text Inputs for Training and Adapting RNN Transducer ASR Models.
Interspeech2022
Xiaodong Cui, George Saon, Tohru Nagano, Masayuki Suzuki, Takashi Fukuda, Brian Kingsbury, Gakuto Kurata, 
Improving Generalization of Deep Neural Network Acoustic Models with Length Perturbation and N-best Based Label Smoothing.
Interspeech2022
Andrea Fasoli, Chia-Yu Chen, Mauricio J. Serrano, Swagath Venkataramani, George Saon, Xiaodong Cui, Brian Kingsbury, Kailash Gopalakrishnan, 
Accelerating Inference and Language Model Fusion of Recurrent Neural Network Transducers via End-to-End 4-bit Quantization.
Interspeech2022
Takashi Fukuda, Samuel Thomas 0001, Masayuki Suzuki, Gakuto Kurata, George Saon, Brian Kingsbury, 
Global RNN Transducer Models For Multi-dialect Speech Recognition.
Interspeech2022
Jiatong Shi, George Saon, David Haws, Shinji Watanabe 0001, Brian Kingsbury, 
VQ-T: RNN Transducers using Vector-Quantized Prediction Network States.
Interspeech2022
Vishal Sunder, Eric Fosler-Lussier, Samuel Thomas 0001, Hong-Kwang Kuo, Brian Kingsbury, 
Tokenwise Contrastive Pretraining for Finer Speech-to-BERT Alignment in End-to-End Speech-to-Intent Systems.
ICASSP2021
Samuel Thomas 0001, Hong-Kwang Jeff Kuo, George Saon, Zoltán Tüske, Brian Kingsbury, Gakuto Kurata, Zvi Kons, Ron Hoory, 
RNN Transducer Models for Spoken Language Understanding.
ICASSP2021
Edmilson da Silva Morais, Hong-Kwang Jeff Kuo, Samuel Thomas 0001, Zoltán Tüske, Brian Kingsbury, 
End-to-End Spoken Language Understanding Using Transformer Networks and Self-Supervised Pre-Trained Features.
ICASSP2021
George Saon, Zoltán Tüske, Daniel Bolaños, Brian Kingsbury, 
Advancing RNN Transducer Technology for Speech Recognition.
Interspeech2021
Xiaodong Cui, Brian Kingsbury, George Saon, David Haws, Zoltán Tüske, 
Reducing Exposure Bias in Training Recurrent Neural Network Transducers.
Interspeech2021
Andrea Fasoli, Chia-Yu Chen, Mauricio J. Serrano, Xiao Sun, Naigang Wang, Swagath Venkataramani, George Saon, Xiaodong Cui, Brian Kingsbury, Wei Zhang 0022, Zoltán Tüske, Kailash Gopalakrishnan, 
4-Bit Quantization of LSTM-Based Speech Recognition Models.
Interspeech2021
Jatin Ganhotra, Samuel Thomas 0001, Hong-Kwang Jeff Kuo, Sachindra Joshi, George Saon, Zoltán Tüske, Brian Kingsbury, 
Integrating Dialog History into End-to-End Spoken Language Understanding Systems.
Interspeech2021
Gakuto Kurata, George Saon, Brian Kingsbury, David Haws, Zoltán Tüske, 
Improving Customization of Neural Transducers by Mitigating Acoustic Mismatch of Synthesized Audio.
Interspeech2021
Andrew Rouditchenko, Angie W. Boggust, David Harwath, Samuel Thomas 0001, Hilde Kuehne, Brian Chen, Rameswar Panda, Rogério Feris, Brian Kingsbury, Michael Picheny, James R. Glass, 
Cascaded Multilingual Audio-Visual Learning from Videos.
Interspeech2021
Andrew Rouditchenko, Angie W. Boggust, David Harwath, Brian Chen, Dhiraj Joshi, Samuel Thomas 0001, Kartik Audhkhasi, Hilde Kuehne, Rameswar Panda, Rogério Schmidt Feris, Brian Kingsbury, Michael Picheny, Antonio Torralba 0001, James R. Glass, 
AVLnet: Learning Audio-Visual Language Representations from Instructional Videos.
Interspeech2021
Zoltán Tüske, George Saon, Brian Kingsbury, 
On the Limit of English Conversational Speech Recognition.
TASLP2022
Shoukang Hu, Xurong Xie, Mingyu Cui, Jiajun Deng, Shansong Liu, Jianwei Yu, Mengzhe Geng, Xunying Liu, Helen Meng, 
Neural Architecture Search for LF-MMI Trained Time Delay Neural Networks.
ICASSP2022
Guinan Li, Jianwei Yu, Jiajun Deng, Xunying Liu, Helen Meng, 
Audio-Visual Multi-Channel Speech Separation, Dereverberation and Recognition.
ICASSP2022
Junhao Xu, Jianwei Yu, Xunying Liu, Helen Meng, 
Mixed Precision DNN Quantization for Overlapped Speech Separation and Recognition.
ICASSP2022
Naijun Zheng, Na Li 0012, Jianwei Yu, Chao Weng, Dan Su 0002, Xunying Liu, Helen Meng, 
Multi-Channel Speaker Diarization Using Spatial Features for Meetings.
Interspeech2022
Ziqian Dai, Jianwei Yu, Yan Wang, Nuo Chen, Yanyao Bian, Guangzhi Li, Deng Cai 0002, Dong Yu 0001, 
Automatic Prosody Annotation with Pre-Trained Text-Speech Model.
Interspeech2022
Lingyun Feng, Jianwei Yu, Yan Wang, Songxiang Liu, Deng Cai 0002, Haitao Zheng, 
ASR-Robust Natural Language Understanding on ASR-GLUE dataset.
Interspeech2022
Jinchuan Tian, Jianwei Yu, Chunlei Zhang, Yuexian Zou, Dong Yu 0001, 
LAE: Language-Aware Encoder for Monolingual and Multilingual ASR.
Interspeech2022
Helin Wang, Dongchao Yang, Chao Weng, Jianwei Yu, Yuexian Zou, 
Improving Target Sound Extraction with Timestamp Information.
TASLP2021
Shoukang Hu, Xurong Xie, Shansong Liu, Jianwei Yu, Zi Ye, Mengzhe Geng, Xunying Liu, Helen Meng, 
Bayesian Learning of LF-MMI Trained Time Delay Neural Networks for Speech Recognition.
TASLP2021
Shansong Liu, Mengzhe Geng, Shoukang Hu, Xurong Xie, Mingyu Cui, Jianwei Yu, Xunying Liu, Helen Meng, 
Recent Progress in the CUHK Dysarthric Speech Recognition System.
TASLP2021
Junhao Xu, Jianwei Yu, Shoukang Hu, Xunying Liu, Helen Meng, 
Mixed Precision Low-Bit Quantization of Neural Network Language Models for Speech Recognition.
TASLP2021
Jianwei Yu, Shi-Xiong Zhang, Bo Wu, Shansong Liu, Shoukang Hu, Mengzhe Geng, Xunying Liu, Helen Meng, Dong Yu 0001, 
Audio-Visual Multi-Channel Integration and Recognition of Overlapped Speech.
ICASSP2021
Jinchao Li, Jianwei Yu, Zi Ye, Simon Wong, Man-Wai Mak, Brian Mak, Xunying Liu, Helen Meng, 
A Comparative Study of Acoustic and Linguistic Features Classification for Alzheimer's Disease Detection.
ICASSP2021
Junhao Xu, Shoukang Hu, Jianwei Yu, Xunying Liu, Helen Meng, 
Mixed Precision Quantization of Transformer Language Models for Speech Recognition.
ICASSP2021
Zi Ye, Shoukang Hu, Jinchao Li, Xurong Xie, Mengzhe Geng, Jianwei Yu, Junhao Xu, Boyang Xue, Shansong Liu, Xunying Liu, Helen Meng, 
Development of the Cuhk Elderly Speech Recognition System for Neurocognitive Disorder Detection Using the Dementiabank Corpus.
ICASSP2021
Naijun Zheng, Na Li 0012, Bo Wu, Meng Yu 0003, Jianwei Yu, Chao Weng, Dan Su 0002, Xunying Liu, Helen Meng, 
A Joint Training Framework of Multi-Look Separator and Speaker Embedding Extractor for Overlapped Speech.
Interspeech2021
Jiajun Deng, Fabian Ritter Gutierrez, Shoukang Hu, Mengzhe Geng, Xurong Xie, Zi Ye, Shansong Liu, Jianwei Yu, Xunying Liu, Helen Meng, 
Bayesian Parametric and Architectural Domain Adaptation of LF-MMI Trained TDNNs for Elderly and Dysarthric Speech Recognition.
Interspeech2021
Mengzhe Geng, Shansong Liu, Jianwei Yu, Xurong Xie, Shoukang Hu, Zi Ye, Zengrui Jin, Xunying Liu, Helen Meng, 
Spectro-Temporal Deep Features for Disordered Speech Assessment and Recognition.
Interspeech2021
Zengrui Jin, Mengzhe Geng, Xurong Xie, Jianwei Yu, Shansong Liu, Xunying Liu, Helen Meng, 
Adversarial Data Augmentation for Disordered Speech Recognition.
Interspeech2021
Helin Wang, Bo Wu, Lianwu Chen, Meng Yu 0003, Jianwei Yu, Yong Xu 0004, Shi-Xiong Zhang, Chao Weng, Dan Su 0002, Dong Yu 0001, 
TeCANet: Temporal-Contextual Attention Network for Environment-Aware Speech Dereverberation.
ICASSP2022
Metehan Cekic, Ruirui Li 0002, Zeya Chen, Yuguang Yang 0004, Andreas Stolcke, Upamanyu Madhow, 
Self-Supervised Speaker Recognition Training using Human-Machine Dialogues.
ICASSP2022
Aparna Khare, Eunjung Han, Yuguang Yang 0004, Andreas Stolcke, 
ASR-Aware End-to-End Neural Diarization.
ICASSP2022
K. C. Kishan, Zhenning Tan, Long Chen, Minho Jin, Eunjung Han, Andreas Stolcke, Chul Lee, 
OpenFEAT: Improving Speaker Identification by Open-Set Few-Shot Embedding Adaptation with Transformer.
ICASSP2022
Hua Shen, Yuguang Yang 0004, Guoli Sun, Ryan Langman, Eunjung Han, Jasha Droppo, Andreas Stolcke, 
Improving Fairness in Speaker Verification via Group-Adapted Fusion Network.
ICASSP2022
Liyan Xu, Yile Gu, Jari Kolehmainen, Haidar Khan, Ankur Gandhe, Ariya Rastrow, Andreas Stolcke, Ivan Bulyko, 
RescoreBERT: Discriminative Speech Recognition Rescoring With Bert.
ICASSP2022
Chao-Han Huck Yang, Zeeshan Ahmed, Yile Gu, Joseph Szurley, Roger Ren, Linda Liu, Andreas Stolcke, Ivan Bulyko, 
Mitigating Closed-Model Adversarial Examples with Bayesian Neural Modeling for Enhanced End-to-End Speech Recognition.
ICASSP2022
Xin Zhang, Minho Jin, Roger Cheng, Ruirui Li 0002, Eunjung Han, Andreas Stolcke, 
Contrastive-mixup Learning for Improved Speaker Verification.
Interspeech2022
Long Chen, Yixiong Meng, Venkatesh Ravichandran, Andreas Stolcke, 
Graph-based Multi-View Fusion and Local Adaptation: Mitigating Within-Household Confusability for Speaker Identification.
Interspeech2022
Pranav Dheram, Murugesan Ramakrishnan, Anirudh Raju, I-Fan Chen, Brian King, Katherine Powell, Melissa Saboowala, Karan Shetty, Andreas Stolcke, 
Toward Fairness in Speech Recognition: Discovery and mitigation of performance disparities.
Interspeech2022
Minho Jin, Chelsea Ju, Zeya Chen, Yi-Chieh Liu, Jasha Droppo, Andreas Stolcke, 
Adversarial Reweighting for Speaker Verification Fairness.
Interspeech2022
Viet Anh Trinh, Pegah Ghahremani, Brian King, Jasha Droppo, Andreas Stolcke, Roland Maas, 
Reducing Geographic Disparities in Automatic Speech Recognition via Elastic Weight Consolidation.
ICASSP2021
Eunjung Han, Chul Lee, Andreas Stolcke, 
BW-EDA-EEND: streaming END-TO-END Neural Speaker Diarization for a Variable Number of Speakers.
ICASSP2021
Hu Hu, Xuesong Yang, Zeynab Raeesy, Jinxi Guo, Gokce Keskin, Harish Arsikere, Ariya Rastrow, Andreas Stolcke, Roland Maas, 
REDAT: Accent-Invariant Representation for End-To-End ASR by Domain Adversarial Training with Relabeling.
ICASSP2021
Mao Li, Bo Yang, Joshua Levy, Andreas Stolcke, Viktor Rozgic, Spyros Matsoukas, Constantinos Papayiannis, Daniel Bone, Chao Wang 0018, 
Contrastive Unsupervised Learning for Speech Emotion Recognition.
ICASSP2021
Surabhi Punjabi, Harish Arsikere, Zeynab Raeesy, Chander Chandak, Nikhil Bhave, Ankish Bansal, Markus Müller, Sergio Murillo, Ariya Rastrow, Andreas Stolcke, Jasha Droppo, Sri Garimella, Roland Maas, Mat Hans, Athanasios Mouchtaris, Siegfried Kunzmann, 
Joint ASR and Language Identification Using RNN-T: An Efficient Approach to Dynamic Language Switching.
ICASSP2021
Milind Rao, Pranav Dheram, Gautam Tiwari, Anirudh Raju, Jasha Droppo, Ariya Rastrow, Andreas Stolcke, 
DO as I Mean, Not as I Say: Sequence Loss Training for Spoken Language Understanding.
Interspeech2021
Long Chen, Venkatesh Ravichandran, Andreas Stolcke, 
Graph-Based Label Propagation for Semi-Supervised Speaker Identification.
Interspeech2021
Ruirui Li 0002, Chelsea J.-T. Ju, Zeya Chen, Hongda Mao, Oguz Elibol, Andreas Stolcke, 
Fusion of Embeddings Networks for Robust Combination of Text Dependent and Independent Speaker Recognition.
Interspeech2021
Yi-Chieh Liu, Eunjung Han, Chul Lee, Andreas Stolcke, 
End-to-End Neural Diarization: From Transformer to Conformer.
Interspeech2021
Swayambhu Nath Ray, Minhua Wu, Anirudh Raju, Pegah Ghahremani, Raghavendra Bilgi, Milind Rao, Harish Arsikere, Ariya Rastrow, Andreas Stolcke, Jasha Droppo, 
Listen with Intent: Improving Speech Recognition with Audio-to-Intent Front-End.
ICASSP2022
Han Chen, Yan Song 0001, Li-Rong Dai 0001, Ian McLoughlin 0001, Lin Liu, 
Self-Supervised Representation Learning for Unsupervised Anomalous Sound Detection Under Domain Shift.
ICASSP2022
Hang-Rui Hu, Yan Song 0001, Ying Liu, Li-Rong Dai 0001, Ian McLoughlin 0001, Lin Liu, 
Domain Robust Deep Embedding Learning for Speaker Recognition.
ICASSP2022
Yuxuan Xi, Yan Song 0001, Li-Rong Dai 0001, Ian McLoughlin 0001, Lin Liu, 
Frontend Attributes Disentanglement for Speech Emotion Recognition.
Interspeech2022
Zhifu Gao, Shiliang Zhang, Ian McLoughlin 0001, Zhijie Yan, 
Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition.
Interspeech2022
Hang-Rui Hu, Yan Song 0001, Li-Rong Dai 0001, Ian McLoughlin 0001, Lin Liu, 
Class-Aware Distribution Alignment based Unsupervised Domain Adaptation for Speaker Verification.
TASLP2021
Jian Tang, Jie Zhang 0042, Yan Song 0001, Ian McLoughlin 0001, Li-Rong Dai 0001, 
Multi-Granularity Sequence Alignment Mapping for Encoder-Decoder Based End-to-End ASR.
ICASSP2021
Ying Liu, Yan Song 0001, Ian McLoughlin 0001, Lin Liu, Li-Rong Dai 0001, 
An Effective Deep Embedding Learning Method Based on Dense-Residual Networks for Speaker Verification.
ICASSP2021
Huy Phan, Huy Le Nguyen, Oliver Y. Chén, Philipp Koch, Ngoc Q. K. Duong, Ian McLoughlin 0001, Alfred Mertins, 
Self-Attention Generative Adversarial Network for Speech Enhancement.
ICASSP2021
Huy Phan, Huy Le Nguyen, Oliver Y. Chén, Lam Dang Pham, Philipp Koch, Ian McLoughlin 0001, Alfred Mertins, 
Multi-View Audio And Music Classification.
Interspeech2021
Zhifu Gao, Yiwu Yao, Shiliang Zhang, Jun Yang, Ming Lei, Ian McLoughlin 0001, 
Extremely Low Footprint End-to-End ASR System for Smart Device.
Interspeech2021
Hui Wang, Lin Liu, Yan Song 0001, Lei Fang, Ian McLoughlin 0001, Li-Rong Dai 0001, 
A Weight Moving Average Based Alternate Decoupled Learning Algorithm for Long-Tailed Language Identification.
Interspeech2021
Xu Zheng, Yan Song 0001, Li-Rong Dai 0001, Ian McLoughlin 0001, Lin Liu, 
An Effective Mutual Mean Teaching Based Domain Adaptation Method for Sound Event Detection.
TASLP2020
Olivier Perrotin, Ian Vince McLoughlin, 
Glottal Flow Synthesis for Whisper-to-Speech Conversion.
ICASSP2020
Hui Wang, Yan Song 0001, Zengxi Li, Ian McLoughlin 0001, Li-Rong Dai 0001, 
An Online Speaker-aware Speech Separation Approach Based on Time-domain Representation.
ICASSP2020
Jie Yan, Yan Song 0001, Li-Rong Dai 0001, Ian McLoughlin 0001, 
Task-Aware Mean Teacher Method for Large Scale Weakly Labeled Semi-Supervised Sound Event Detection.
Interspeech2020
Zhifu Gao, Shiliang Zhang, Ming Lei, Ian McLoughlin 0001, 
SAN-M: Memory Equipped Self-Attention for End-to-End Speech Recognition.
Interspeech2020
Ying Liu, Yan Song 0001, Yiheng Jiang, Ian McLoughlin 0001, Lin Liu, Li-Rong Dai 0001, 
An Effective Speaker Recognition Method Based on Joint Identification and Verification Supervisions.
Interspeech2020
Han Tong, Hamid R. Sharifzadeh, Ian McLoughlin 0001, 
Automatic Assessment of Dysarthric Severity Level Using Audio-Video Cross-Modal Approach in Deep Learning.
Interspeech2020
Zi-qiang Zhang, Yan Song 0001, Jian-Shu Zhang, Ian McLoughlin 0001, Li-Rong Dai 0001, 
Semi-Supervised End-to-End ASR via Teacher-Student Learning with Conditional Posterior Distribution.
Interspeech2020
Xu Zheng, Yan Song 0001, Jie Yan, Li-Rong Dai 0001, Ian McLoughlin 0001, Lin Liu, 
An Effective Perturbation Based Semi-Supervised Learning Method for Sound Event Detection.
ICASSP2022
Xinhao Mei, Xubo Liu, Jianyuan Sun, Mark D. Plumbley, Wenwu Wang 0001, 
Diverse Audio Captioning Via Adversarial Training.
ICASSP2022
Dongchao Yang, Helin Wang, Yuexian Zou, Zhongjie Ye, Wenwu Wang 0001, 
A Mutual Learning Framework for Few-Shot Sound Event Detection.
ICASSP2022
Jinzheng Zhao, Peipei Wu, Xubo Liu, Yong Xu 0004, Lyudmila Mihaylova, Simon J. Godsill, Wenwu Wang 0001, 
Audio-Visual Tracking of Multiple Speakers Via a PMBM Filter.
Interspeech2022
Xubo Liu, Haohe Liu, Qiuqiang Kong, Xinhao Mei, Jinzheng Zhao, Qiushi Huang, Mark D. Plumbley, Wenwu Wang 0001, 
Separate What You Describe: Language-Queried Audio Source Separation.
Interspeech2022
Xinhao Mei, Xubo Liu, Jianyuan Sun, Mark D. Plumbley, Wenwu Wang 0001, 
On Metric Learning for Audio-Text Cross-Modal Retrieval.
Interspeech2022
Dongchao Yang, Helin Wang, Zhongjie Ye, Yuexian Zou, Wenwu Wang 0001, 
RaDur: A Reference-aware and Duration-robust Network for Target Sound Detection.
Interspeech2022
Jinzheng Zhao, Peipei Wu, Xubo Liu, Shidrokh Goudarzi, Haohe Liu, Yong Xu 0004, Wenwu Wang 0001, 
Audio Visual Multi-Speaker Tracking with Improved GCF and PMBM Filter.
TASLP2021
Weitao Yuan, Bofei Dong, Shengbei Wang, Masashi Unoki, Wenwu Wang 0001, 
Evolving Multi-Resolution Pooling CNN for Monaural Singing Voice Separation.
ICASSP2021
Shuoyang Li, Yuhui Luo, Jonathon A. Chambers, Wenwu Wang 0001, 
Dimension Selected Subspace Clustering.
ICASSP2021
Helin Wang, Yuexian Zou, Wenwu Wang 0001, 
A Global-Local Attention Framework for Weakly Labelled Audio Tagging.
Interspeech2021
Helin Wang, Yuexian Zou, Wenwu Wang 0001, 
SpecAugment++: A Hidden Space Data Augmentation Method for Acoustic Scene Classification.
Interspeech2021
Weitao Yuan, Shengbei Wang, Xiangrui Li, Masashi Unoki, Wenwu Wang 0001, 
Crossfire Conditional Generative Adversarial Networks for Singing Voice Extraction.
TASLP2020
Qiuqiang Kong, Yin Cao, Turab Iqbal, Yuxuan Wang 0002, Wenwu Wang 0001, Mark D. Plumbley, 
PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition.
ICASSP2020
Jian Guan, Jiabei Liu, Jianguo Sun, Pengming Feng, Tong Shuai, Wenwu Wang 0001, 
Meta Metric Learning for Highly Imbalanced Aerial Scene Classification.
ICASSP2020
Sixin Hong, Yuexian Zou, Wenwu Wang 0001, Meng Cao, 
Weakly Labelled Audio Tagging Via Convolutional Networks with Spatial and Channel-Wise Attention.
ICASSP2020
Turab Iqbal, Yin Cao, Qiuqiang Kong, Mark D. Plumbley, Wenwu Wang 0001, 
Learning With Out-of-Distribution Data for Audio Classification.
ICASSP2020
Takahiro Murakami, Wenwu Wang 0001, 
An Analytical Solution to Jacobsen Estimator for Windowed Signals.
Interspeech2020
Sixin Hong, Yuexian Zou, Wenwu Wang 0001, 
Gated Multi-Head Attention Pooling for Weakly Labelled Audio Tagging.
Interspeech2020
Helin Wang, Yuexian Zou, Dading Chong, Wenwu Wang 0001, 
Environmental Sound Classification with Parallel Temporal-Spectral Attention.
TASLP2019
Qiuqiang Kong, Changsong Yu, Yong Xu 0004, Turab Iqbal, Wenwu Wang 0001, Mark D. Plumbley, 
Weakly Labelled AudioSet Tagging With Attention Neural Networks.
TASLP2022
Sashi Novitasari, Sakriani Sakti, Satoshi Nakamura 0001, 
A Machine Speech Chain Approach for Dynamically Adaptive Lombard TTS in Static and Dynamic Noise Environments.
TASLP2022
Bin Wu, Sakriani Sakti, Jinsong Zhang 0001, Satoshi Nakamura 0001, 
Modeling Unsupervised Empirical Adaptation by DPGMM and DPGMM-RNN Hybrid Model to Extract Perceptual Features for Low-Resource ASR.
Interspeech2022
Heli Qi, Sashi Novitasari, Sakriani Sakti, Satoshi Nakamura 0001, 
Improved Consistency Training for Semi-Supervised Sequence-to-Sequence ASR via Speech Chain Reconstruction and Self-Transcribing.
TASLP2021
Bin Wu, Sakriani Sakti, Jinsong Zhang 0001, Satoshi Nakamura 0001, 
Tackling Perception Bias in Unsupervised Phoneme Discovery Using DPGMM-RNN Hybrid Model and Functional Load.
Interspeech2021
Johanes Effendi, Sakriani Sakti, Satoshi Nakamura 0001, 
Weakly-Supervised Speech-to-Text Mapping with Visually Connected Non-Parallel Speech-Text Data Using Cyclic Partially-Aligned Transformer.
Interspeech2021
Yuka Ko, Katsuhito Sudoh, Sakriani Sakti, Satoshi Nakamura 0001, 
ASR Posterior-Based Loss for Multi-Task End-to-End Speech Translation.
Interspeech2021
Sashi Novitasari, Sakriani Sakti, Satoshi Nakamura 0001, 
Dynamically Adaptive Machine Speech Chain Inference for TTS in Noisy Environment: Listen and Speak Louder.
Interspeech2021
Shun Takahashi, Sakriani Sakti, Satoshi Nakamura 0001, 
Unsupervised Neural-Based Graph Clustering for Variable-Length Speech Representation Discovery of Zero-Resource Languages.
Interspeech2021
Hirotaka Tokuyama, Sakriani Sakti, Katsuhito Sudoh, Satoshi Nakamura 0001, 
Transcribing Paralinguistic Acoustic Cues to Target Language Text in Transformer-Based Speech-to-Text Translation.
TASLP2020
Andros Tjandra, Sakriani Sakti, Satoshi Nakamura 0001, 
Machine Speech Chain.
TASLP2020
Andros Tjandra, Sakriani Sakti, Satoshi Nakamura 0001, 
Corrections to "Machine Speech Chain".
Interspeech2020
Ewan Dunbar, Julien Karadayi, Mathieu Bernard, Xuan-Nga Cao, Robin Algayres, Lucas Ondel, Laurent Besacier, Sakriani Sakti, Emmanuel Dupoux, 
The Zero Resource Speech Challenge 2020: Discovering Discrete Subword and Word Units.
Interspeech2020
Johanes Effendi, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura 0001, 
Augmenting Images for ASR and TTS Through Single-Loop and Dual-Loop Multimodal Chain Framework.
Interspeech2020
Sashi Novitasari, Andros Tjandra, Tomoya Yanagita, Sakriani Sakti, Satoshi Nakamura 0001, 
Incremental Machine Speech Chain Towards Enabling Listening While Speaking in Real-Time.
Interspeech2020
Ivan Halim Parmonangan, Hiroki Tanaka, Sakriani Sakti, Satoshi Nakamura 0001, 
Combining Audio and Brain Activity for Predicting Speech Quality.
Interspeech2020
Andros Tjandra, Sakriani Sakti, Satoshi Nakamura 0001, 
Transformer VQ-VAE for Unsupervised Unit Discovery and Speech Synthesis: ZeroSpeech 2020 Challenge.
Interspeech2020
Kazuki Tsunematsu, Johanes Effendi, Sakriani Sakti, Satoshi Nakamura 0001, 
Neural Speech Completion.
TASLP2019
Nurul Lubis, Sakriani Sakti, Koichiro Yoshino, Satoshi Nakamura 0001, 
Positive Emotion Elicitation in Chat-Based Dialogue Systems.
ICASSP2019
Holy Lovenia, Hiroki Tanaka, Sakriani Sakti, Ayu Purwarianti, Satoshi Nakamura 0001, 
Speech Artifact Removal from Eeg Recordings of Spoken Word Production with Tensor Decomposition.
ICASSP2019
Andros Tjandra, Sakriani Sakti, Satoshi Nakamura 0001, 
End-to-end Feedback Loss in Speech Chain Framework via Straight-through Estimator.
TASLP2022
Brij Mohan Lal Srivastava, Mohamed Maouche, Md. Sahidullah, Emmanuel Vincent 0001, Aurélien Bellet, Marc Tommasi, Natalia A. Tomashenko, Xin Wang 0037, Junichi Yamagishi, 
Privacy and Utility of X-Vector Based Speaker Anonymization.
ICASSP2022
Xin Wang 0037, Junichi Yamagishi, 
Estimating the Confidence of Speech Spoofing Countermeasure.
ICASSP2022
Chang Zeng, Xin Wang 0037, Erica Cooper, Xiaoxiao Miao, Junichi Yamagishi, 
Attention Back-End for Automatic Speaker Verification with Multiple Enrollment Utterances.
Interspeech2022
Xiaoxiao Miao, Xin Wang 0037, Erica Cooper, Junichi Yamagishi, Natalia A. Tomashenko, 
Analyzing Language-Independent Speaker Anonymization Framework under Unseen Conditions.
ICASSP2021
Shuhei Kato, Yusuke Yasuda, Xin Wang 0037, Erica Cooper, Junichi Yamagishi, 
How Similar or Different is Rakugo Speech Synthesizer to Professional Performers?
ICASSP2021
Yusuke Yasuda, Xin Wang 0037, Junichi Yamagishi, 
End-to-End Text-to-Speech Using Latent Duration Based on VQ-VAE.
Interspeech2021
Tomi Kinnunen, Andreas Nautsch, Md. Sahidullah, Nicholas W. D. Evans, Xin Wang 0037, Massimiliano Todisco, Héctor Delgado, Junichi Yamagishi, Kong Aik Lee, 
Visualizing Classifier Adjacency Relations: A Case Study in Speaker Verification and Voice Anti-Spoofing.
Interspeech2021
Xin Wang 0037, Junichi Yamagishi, 
A Comparative Study on Recent Neural Spoofing Countermeasures for Synthetic Speech Detection.
Interspeech2021
Lin Zhang, Xin Wang 0037, Erica Cooper, Junichi Yamagishi, Jose Patino 0001, Nicholas W. D. Evans, 
An Initial Investigation for Detecting Partially Spoofed Audio.
TASLP2020
Tomi Kinnunen, Héctor Delgado, Nicholas W. D. Evans, Kong Aik Lee, Ville Vestman, Andreas Nautsch, Massimiliano Todisco, Xin Wang 0037, Md. Sahidullah, Junichi Yamagishi, Douglas A. Reynolds, 
Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification: Fundamentals.
TASLP2020
Xin Wang 0037, Shinji Takaki, Junichi Yamagishi, 
Neural Source-Filter Waveform Models for Statistical Parametric Speech Synthesis.
TASLP2020
Xin Wang 0037, Shinji Takaki, Junichi Yamagishi, Simon King, Keiichi Tokuda, 
A Vector Quantized Variational Autoencoder (VQ-VAE) Autoregressive Neural F0 Model for Statistical Parametric Speech Synthesis.
ICASSP2020
Erica Cooper, Cheng-I Lai, Yusuke Yasuda, Fuming Fang, Xin Wang 0037, Nanxin Chen, Junichi Yamagishi, 
Zero-Shot Multi-Speaker Text-To-Speech with State-Of-The-Art Neural Speaker Embeddings.
ICASSP2020
Xin Wang 0037, Jun Du, Alejandrina Cristià, Lei Sun, Chin-Hui Lee, 
A Study of Child Speech Extraction Using Joint Speech Enhancement and Separation in Realistic Conditions.
ICASSP2020
Yusuke Yasuda, Xin Wang 0037, Junichi Yamagishi, 
Effect of Choice of Probability Distribution, Randomness, and Search Methods for Alignment Modeling in Sequence-to-Sequence Text-to-Speech Synthesis Using Hard Alignment.
ICASSP2020
Yi Zhao 0006, Xin Wang 0037, Lauri Juvela, Junichi Yamagishi, 
Transferring Neural Speech Waveform Synthesizers to Musical Instrument Sounds Generation.
Interspeech2020
Yang Ai, Xin Wang 0037, Junichi Yamagishi, Zhen-Hua Ling, 
Reverberation Modeling for Source-Filter-Based Neural Vocoder.
Interspeech2020
Brij Mohan Lal Srivastava, Natalia A. Tomashenko, Xin Wang 0037, Emmanuel Vincent 0001, Junichi Yamagishi, Mohamed Maouche, Aurélien Bellet, Marc Tommasi, 
Design Choices for X-Vector Based Speaker Anonymization.
Interspeech2020
Natalia A. Tomashenko, Brij Mohan Lal Srivastava, Xin Wang 0037, Emmanuel Vincent 0001, Andreas Nautsch, Junichi Yamagishi, Nicholas W. D. Evans, Jose Patino 0001, Jean-François Bonastre, Paul-Gauthier Noé, Massimiliano Todisco, 
Introducing the VoicePrivacy Initiative.
Interspeech2020
Xin Wang 0037, Junichi Yamagishi, 
Using Cyclic Noise as the Source Signal for Neural Source-Filter-Based Speech Waveform Model.
ICASSP2022
Tara N. Sainath, Yanzhang He, Arun Narayanan, Rami Botros, Weiran Wang, David Qiu, Chung-Cheng Chiu, Rohit Prabhavalkar, Alexander Gruenstein, Anmol Gulati, Bo Li 0028, David Rybach, Emmanuel Guzman, Ian McGraw, James Qin, Krzysztof Choromanski, Qiao Liang, Robert David, Ruoming Pang, Shuo-Yiin Chang, Trevor Strohman, W. Ronny Huang, Wei Han 0002, Yonghui Wu, Yu Zhang 0033, 
Improving The Latency And Quality Of Cascaded Encoders.
ICASSP2022
Chao Zhang, Bo Li 0028, Zhiyun Lu, Tara N. Sainath, Shuo-Yiin Chang, 
Improving the Fusion of Acoustic and Text Representations in RNN-T.
Interspeech2022
Shuo-Yiin Chang, Bo Li 0028, Tara N. Sainath, Chao Zhang, Trevor Strohman, Qiao Liang, Yanzhang He, 
Turn-Taking Prediction for Natural Conversational Speech.
Interspeech2022
Shuo-Yiin Chang, Guru Prakash, Zelin Wu, Tara N. Sainath, Bo Li 0028, Qiao Liang, Adam Stambler, Shyam Upadhyay, Manaal Faruqui, Trevor Strohman, 
Streaming Intended Query Detection using E2E Modeling for Continued Conversation.
Interspeech2022
Bo Li 0028, Tara N. Sainath, Ruoming Pang, Shuo-Yiin Chang, Qiumin Xu, Trevor Strohman, Vince Chen, Qiao Liang, Heguang Liu, Yanzhang He, Parisa Haghani, Sameer Bidichandani, 
A Language Agnostic Multilingual Streaming On-Device ASR System.
Interspeech2022
Chao Zhang, Bo Li 0028, Tara N. Sainath, Trevor Strohman, Sepand Mavandadi, Shuo-Yiin Chang, Parisa Haghani, 
Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification.
ICASSP2021
Bo Li 0028, Anmol Gulati, Jiahui Yu, Tara N. Sainath, Chung-Cheng Chiu, Arun Narayanan, Shuo-Yiin Chang, Ruoming Pang, Yanzhang He, James Qin, Wei Han 0002, Qiao Liang, Yu Zhang 0033, Trevor Strohman, Yonghui Wu, 
A Better and Faster end-to-end Model for Streaming ASR.
ICASSP2021
Qiujia Li, David Qiu, Yu Zhang 0033, Bo Li 0028, Yanzhang He, Philip C. Woodland, Liangliang Cao, Trevor Strohman, 
Confidence Estimation for Attention-Based Sequence-to-Sequence Models for Speech Recognition.
ICASSP2021
David Qiu, Qiujia Li, Yanzhang He, Yu Zhang 0033, Bo Li 0028, Liangliang Cao, Rohit Prabhavalkar, Deepti Bhatia, Wei Li 0133, Ke Hu, Tara N. Sainath, Ian McGraw, 
Learning Word-Level Confidence for Subword End-To-End ASR.
ICASSP2021
Jiahui Yu, Chung-Cheng Chiu, Bo Li 0028, Shuo-Yiin Chang, Tara N. Sainath, Yanzhang He, Arun Narayanan, Wei Han 0002, Anmol Gulati, Yonghui Wu, Ruoming Pang, 
FastEmit: Low-Latency Streaming ASR with Sequence-Level Emission Regularization.
Interspeech2021
Qiujia Li, Yu Zhang 0033, Bo Li 0028, Liangliang Cao, Philip C. Woodland, 
Residual Energy-Based Models for End-to-End Speech Recognition.
Interspeech2021
Tara N. Sainath, Yanzhang He, Arun Narayanan, Rami Botros, Ruoming Pang, David Rybach, Cyril Allauzen, Ehsan Variani, James Qin, Quoc-Nam Le-The, Shuo-Yiin Chang, Bo Li 0028, Anmol Gulati, Jiahui Yu, Chung-Cheng Chiu, Diamantino Caseiro, Wei Li 0133, Qiao Liang, Pat Rondon, 
An Efficient Streaming Non-Recurrent On-Device End-to-End Model with Improvements to Rare-Word Modeling.
ICLR2021
Jiahui Yu, Wei Han 0002, Anmol Gulati, Chung-Cheng Chiu, Bo Li 0028, Tara N. Sainath, Yonghui Wu, Ruoming Pang, 
Dual-mode ASR: Unify and Improve Streaming ASR with Full-context Modeling.
ICASSP2020
Bo Li 0028, Shuo-Yiin Chang, Tara N. Sainath, Ruoming Pang, Yanzhang He, Trevor Strohman, Yonghui Wu, 
Towards Fast and Accurate Streaming End-To-End ASR.
ICASSP2020
Daniel S. Park, Yu Zhang 0033, Chung-Cheng Chiu, Youzheng Chen, Bo Li 0028, William Chan, Quoc V. Le, Yonghui Wu, 
Specaugment on Large Scale Datasets.
ICASSP2020
Tara N. Sainath, Yanzhang He, Bo Li 0028, Arun Narayanan, Ruoming Pang, Antoine Bruguier, Shuo-Yiin Chang, Wei Li 0133, Raziel Alvarez, Zhifeng Chen, Chung-Cheng Chiu, David Garcia, Alexander Gruenstein, Ke Hu, Anjuli Kannan, Qiao Liang, Ian McGraw, Cal Peyser, Rohit Prabhavalkar, Golan Pundak, David Rybach, Yuan Shangguan, Yash Sheth, Trevor Strohman, Mirkó Visontai, Yonghui Wu, Yu Zhang 0033, Ding Zhao, 
A Streaming On-Device End-To-End Model Surpassing Server-Side Conventional Model Quality and Latency.
Interspeech2020
Shuo-Yiin Chang, Bo Li 0028, David Rybach, Yanzhang He, Wei Li 0133, Tara N. Sainath, Trevor Strohman, 
Low Latency Speech Recognition Using End-to-End Prefetching.
Interspeech2020
Daniel S. Park, Yu Zhang 0033, Ye Jia, Wei Han 0002, Chung-Cheng Chiu, Bo Li 0028, Yonghui Wu, Quoc V. Le, 
Improved Noisy Student Training for Automatic Speech Recognition.
ICASSP2019
Bo Li 0028, Tara N. Sainath, Ruoming Pang, Zelin Wu, 
Semi-supervised Training for End-to-end Models via Weak Distillation.
ICASSP2019
Yanzhang He, Tara N. Sainath, Rohit Prabhavalkar, Ian McGraw, Raziel Alvarez, Ding Zhao, David Rybach, Anjuli Kannan, Yonghui Wu, Ruoming Pang, Qiao Liang, Deepti Bhatia, Yuan Shangguan, Bo Li 0028, Golan Pundak, Khe Chai Sim, Tom Bagby, Shuo-Yiin Chang, Kanishka Rao, Alexander Gruenstein, 
Streaming End-to-end Speech Recognition for Mobile Devices.
TASLP2022
Chenda Li, Zhuo Chen 0006, Yanmin Qian, 
Dual-Path Modeling With Memory Embedding Model for Continuous Speech Separation.
ICASSP2022
Sefik Emre Eskimez, Takuya Yoshioka, Huaming Wang, Xiaofei Wang 0009, Zhuo Chen 0006, Xuedong Huang 0001, 
Personalized speech enhancement: new models and Comprehensive evaluation.
ICASSP2022
Naoyuki Kanda, Xiong Xiao, Yashesh Gaur, Xiaofei Wang 0009, Zhong Meng, Zhuo Chen 0006, Takuya Yoshioka, 
Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers Using End-to-End Speaker-Attributed ASR.
ICASSP2022
Desh Raj, Liang Lu 0001, Zhuo Chen 0006, Yashesh Gaur, Jinyu Li 0001, 
Continuous Streaming Multi-Talker ASR with Dual-Path Transducers.
ICASSP2022
Yixuan Zhang, Zhuo Chen 0006, Jian Wu 0027, Takuya Yoshioka, Peidong Wang, Zhong Meng, Jinyu Li 0001, 
Continuous Speech Separation with Recurrent Selective Attention Network.
Interspeech2022
Sanyuan Chen, Yu Wu 0012, Chengyi Wang 0002, Shujie Liu 0001, Zhuo Chen 0006, Peidong Wang, Gang Liu, Jinyu Li 0001, Jian Wu 0027, Xiangzhan Yu, Furu Wei, 
Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?
Interspeech2022
Naoyuki Kanda, Jian Wu 0027, Yu Wu 0012, Xiong Xiao, Zhong Meng, Xiaofei Wang 0009, Yashesh Gaur, Zhuo Chen 0006, Jinyu Li 0001, Takuya Yoshioka, 
Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings.
Interspeech2022
Naoyuki Kanda, Jian Wu 0027, Yu Wu 0012, Xiong Xiao, Zhong Meng, Xiaofei Wang 0009, Yashesh Gaur, Zhuo Chen 0006, Jinyu Li 0001, Takuya Yoshioka, 
Streaming Multi-Talker ASR with Token-Level Serialized Output Training.
Interspeech2022
Wangyou Zhang, Zhuo Chen 0006, Naoyuki Kanda, Shujie Liu 0001, Jinyu Li 0001, Sefik Emre Eskimez, Takuya Yoshioka, Xiong Xiao, Zhong Meng, Yanmin Qian, Furu Wei, 
Separating Long-Form Speech with Group-wise Permutation Invariant Training.
ICASSP2021
Sanyuan Chen, Yu Wu 0012, Zhuo Chen 0006, Takuya Yoshioka, Shujie Liu 0001, Jin-Yu Li 0001, Xiangzhan Yu, 
Don't Shoot Butterfly with Rifles: Multi-Channel Continuous Speech Separation with Early Exit Transformer.
ICASSP2021
Sanyuan Chen, Yu Wu 0012, Zhuo Chen 0006, Jian Wu 0027, Jinyu Li 0001, Takuya Yoshioka, Chengyi Wang 0002, Shujie Liu 0001, Ming Zhou 0001, 
Continuous Speech Separation with Conformer.
ICASSP2021
Naoyuki Kanda, Zhong Meng, Liang Lu 0001, Yashesh Gaur, Xiaofei Wang 0009, Zhuo Chen 0006, Takuya Yoshioka, 
Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR.
ICASSP2021
Chenda Li, Zhuo Chen 0006, Yi Luo 0004, Cong Han, Tianyan Zhou, Keisuke Kinoshita, Marc Delcroix, Shinji Watanabe 0001, Yanmin Qian, 
Dual-Path Modeling for Long Recording Speech Separation in Meetings.
ICASSP2021
Xiong Xiao, Naoyuki Kanda, Zhuo Chen 0006, Tianyan Zhou, Takuya Yoshioka, Sanyuan Chen, Yong Zhao 0008, Gang Liu, Yu Wu 0012, Jian Wu 0027, Shujie Liu 0001, Jinyu Li 0001, Yifan Gong 0001, 
Microsoft Speaker Diarization System for the Voxceleb Speaker Recognition Challenge 2020.
Interspeech2021
Sanyuan Chen, Yu Wu 0012, Zhuo Chen 0006, Jian Wu 0027, Takuya Yoshioka, Shujie Liu 0001, Jinyu Li 0001, Xiangzhan Yu, 
Ultra Fast Speech Separation Model with Teacher Student Learning.
Interspeech2021
Sefik Emre Eskimez, Xiaofei Wang 0009, Min Tang, Hemin Yang, Zirun Zhu, Zhuo Chen 0006, Huaming Wang, Takuya Yoshioka, 
Human Listening and Live Captioning: Multi-Task Training for Speech Enhancement.
Interspeech2021
Yihui Fu, Luyao Cheng, Shubo Lv, Yukai Jv, Yuxiang Kong, Zhuo Chen 0006, Yanxin Hu, Lei Xie 0001, Jian Wu 0027, Hui Bu, Xin Xu, Jun Du, Jingdong Chen, 
AISHELL-4: An Open Source Dataset for Speech Enhancement, Separation, Recognition and Speaker Diarization in Conference Scenario.
Interspeech2021
Cong Han, Yi Luo 0004, Chenda Li, Tianyan Zhou, Keisuke Kinoshita, Shinji Watanabe 0001, Marc Delcroix, Hakan Erdogan, John R. Hershey, Nima Mesgarani, Zhuo Chen 0006, 
Continuous Speech Separation Using Speaker Inventory for Long Recording.
Interspeech2021
Maokui He, Desh Raj, Zili Huang, Jun Du, Zhuo Chen 0006, Shinji Watanabe 0001, 
Target-Speaker Voice Activity Detection with Improved i-Vector Estimation for Unknown Number of Speaker.
Interspeech2021
Naoyuki Kanda, Guoli Ye, Yashesh Gaur, Xiaofei Wang 0009, Zhong Meng, Zhuo Chen 0006, Takuya Yoshioka, 
End-to-End Speaker-Attributed ASR with Transformer.
TASLP2022
Tao Wang 0074, Ruibo Fu, Jiangyan Yi, Jianhua Tao, Zhengqi Wen, 
NeuralDPS: Neural Deterministic Plus Stochastic Model With Multiband Excitation for Noise-Controllable Waveform Generation.
TASLP2022
Tao Wang 0074, Jiangyan Yi, Ruibo Fu, Jianhua Tao, Zhengqi Wen, 
CampNet: Context-Aware Mask Prediction for End-to-End Text-Based Speech Editing.
ICASSP2022
Tao Wang 0074, Jiangyan Yi, Liqun Deng, Ruibo Fu, Jianhua Tao, Zhengqi Wen, 
Context-Aware Mask Prediction Network for End-to-End Text-Based Speech Editing.
TASLP2021
Ye Bai, Jiangyan Yi, Jianhua Tao, Zhengkun Tian, Zhengqi Wen, Shuai Zhang 0014, 
Fast End-to-End Speech Recognition Via Non-Autoregressive Models and Cross-Modal Knowledge Transferring From BERT.
TASLP2021
Ye Bai, Jiangyan Yi, Jianhua Tao, Zhengqi Wen, Zhengkun Tian, Shuai Zhang 0014, 
Integrating Knowledge Into End-to-End Speech Recognition From External Text-Only Data.
TASLP2021
Cunhang Fan, Jiangyan Yi, Jianhua Tao, Zhengkun Tian, Bin Liu 0041, Zhengqi Wen, 
Gated Recurrent Fusion With Joint Training Framework for Robust End-to-End Speech Recognition.
ICASSP2021
Shuai Zhang 0014, Jiangyan Yi, Zhengkun Tian, Ye Bai, Jianhua Tao, Zhengqi Wen, 
Decoupling Pronunciation and Language for End-to-End Code-Switching Automatic Speech Recognition.
ICASSP2021
Ruibo Fu, Jianhua Tao, Zhengqi Wen, Jiangyan Yi, Tao Wang 0074, Chunyu Qiang, 
Bi-Level Style and Prosody Decoupling Modeling for Personalized End-to-End Speech Synthesis.
ICASSP2021
Tao Wang 0074, Ruibo Fu, Jiangyan Yi, Jianhua Tao, Zhengqi Wen, Chunyu Qiang, Shiming Wang, 
Prosody and Voice Factorization for Few-Shot Speaker Adaptation in the Challenge M2voc 2021.
Interspeech2021
Shuai Zhang 0014, Jiangyan Yi, Zhengkun Tian, Ye Bai, Jianhua Tao, Xuefei Liu, Zhengqi Wen, 
End-to-End Spelling Correction Conditioned on Acoustic Feature for Code-Switching Speech Recognition.
Interspeech2021
Zhengkun Tian, Jiangyan Yi, Ye Bai, Jianhua Tao, Shuai Zhang 0014, Zhengqi Wen, 
FSR: Accelerating the Inference Process of Transducer-Based Models by Applying Fast-Skip Regularization.
ICASSP2020
Ruibo Fu, Jianhua Tao, Zhengqi Wen, Jiangyan Yi, Tao Wang 0074, 
Focusing on Attention: Prosody Transfer and Adaptative Optimization Strategy for Multi-Speaker End-to-End Speech Synthesis.
Interspeech2020
Ye Bai, Jiangyan Yi, Jianhua Tao, Zhengkun Tian, Zhengqi Wen, Shuai Zhang 0014, 
Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition.
Interspeech2020
Cunhang Fan, Jianhua Tao, Bin Liu 0041, Jiangyan Yi, Zhengqi Wen, 
Gated Recurrent Fusion of Spatial and Spectral Features for Multi-Channel Speech Separation with Deep Embedding Representations.
Interspeech2020
Cunhang Fan, Jianhua Tao, Bin Liu 0041, Jiangyan Yi, Zhengqi Wen, 
Joint Training for Simultaneous Speech Denoising and Dereverberation with Deep Embedding Representations.
Interspeech2020
Ruibo Fu, Jianhua Tao, Zhengqi Wen, Jiangyan Yi, Chunyu Qiang, Tao Wang 0074, 
Dynamic Soft Windowing and Language Dependent Style Token for Code-Switching End-to-End Speech Synthesis.
Interspeech2020
Ruibo Fu, Jianhua Tao, Zhengqi Wen, Jiangyan Yi, Tao Wang 0074, Chunyu Qiang, 
Dynamic Speaker Representations Adjustment and Decoder Factorization for Speaker Adaptation in End-to-End Speech Synthesis.
Interspeech2020
Zheng Lian, Zhengqi Wen, Xinyong Zhou, Songbai Pu, Shengkai Zhang, Jianhua Tao, 
ARVC: An Auto-Regressive Voice Conversion System Without Parallel Training Data.
Interspeech2020
Zhengkun Tian, Jiangyan Yi, Jianhua Tao, Ye Bai, Shuai Zhang 0014, Zhengqi Wen, 
Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech Recognition.
Interspeech2020
Tao Wang 0074, Xuefei Liu, Jianhua Tao, Jiangyan Yi, Ruibo Fu, Zhengqi Wen, 
Non-Autoregressive End-to-End TTS with Coarse-to-Fine Decoding.
SpeechComm2023
Feng Dang, Hangting Chen, Qi Hu, Pengyuan Zhang, Yonghong Yan 0002, 
First coarse, fine afterward: A lightweight two-stage complex approach for monaural speech enhancement.
TASLP2022
Changfeng Gao, Gaofeng Cheng, Ta Li, Pengyuan Zhang, Yonghong Yan 0002, 
Self-Supervised Pre-Training for Attention-Based Encoder-Decoder ASR Model.
ICASSP2022
Feng Dang, Hangting Chen, Pengyuan Zhang, 
DPT-FSNet: Dual-Path Transformer Based Full-Band and Sub-Band Fusion Network for Speech Enhancement.
ICASSP2022
Keqi Deng, Songjun Cao, Yike Zhang, Long Ma, Gaofeng Cheng, Ji Xu, Pengyuan Zhang, 
Improving CTC-Based Speech Recognition Via Knowledge Transferring from Pre-Trained Language Models.
ICASSP2022
Keqi Deng, Zehui Yang, Shinji Watanabe 0001, Yosuke Higuchi, Gaofeng Cheng, Pengyuan Zhang, 
Improving Non-Autoregressive End-to-End Speech Recognition with Pre-Trained Acoustic and Language Models.
Interspeech2022
Yifan Chen, Yifan Guo, Qingxuan Li, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan 0002, 
Interrelate Training and Searching: A Unified Online Clustering Framework for Speaker Diarization.
Interspeech2022
Hangting Chen, Yi Yang, Feng Dang, Pengyuan Zhang, 
Beam-Guided TasNet: An Iterative Speech Separation Framework with Multi-Channel Output.
Interspeech2022
Chengxin Chen, Pengyuan Zhang, 
CTA-RNN: Channel and Temporal-wise Attention RNN leveraging Pre-trained ASR Embeddings for Speech Emotion Recognition.
Interspeech2022
Yukun Liu, Ta Li, Pengyuan Zhang, Yonghong Yan 0002, 
NAS-SCAE: Searching Compact Attention-based Encoders For End-to-end Automatic Speech Recognition.
Interspeech2022
Zehui Yang, Yifan Chen, Lei Luo, Runyan Yang, Lingxuan Ye, Gaofeng Cheng, Ji Xu, Yaohui Jin, Qingqing Zhang, Pengyuan Zhang, Lei Xie, Yonghong Yan 0002, 
Open Source MagicData-RAMC: A Rich Annotated Mandarin Conversational(RAMC) Speech Dataset.
Interspeech2022
Lingxuan Ye, Gaofeng Cheng, Runyan Yang, Zehui Yang, Sanli Tian, Pengyuan Zhang, Yonghong Yan 0002, 
Improving Recognition of Out-of-vocabulary Words in E2E Code-switching ASR by Fusing Speech Generation Methods.
Interspeech2022
Yuxiang Zhang, Zhuo Li, Wenchao Wang, Pengyuan Zhang, 
SASV Based on Pre-trained ASV System and Integrated Scoring Module.
Interspeech2022
Xueshuai Zhang, Jiakun Shen, Jun Zhou, Pengyuan Zhang, Yonghong Yan 0002, Zhihua Huang, Yanfen Tang, Yu Wang, Fujie Zhang, Shaoxing Zhang, Aijun Sun, 
Robust Cough Feature Extraction and Classification Method for COVID-19 Cough Detection Based on Vocalization Characteristics.
Interspeech2022
Han Zhu, Li Wang, Gaofeng Cheng, Jindong Wang 0001, Pengyuan Zhang, Yonghong Yan 0002, 
Wav2vec-S: Semi-Supervised Pre-Training for Low-Resource ASR.
Interspeech2022
Han Zhu, Jindong Wang 0001, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan 0002, 
Decoupled Federated Learning for ASR with Non-IID Data.
SpeechComm2021
Danyang Liu, Ji Xu, Pengyuan Zhang, Yonghong Yan 0002, 
A unified system for multilingual speech recognition and language identification.
ICASSP2021
Changfeng Gao, Gaofeng Cheng, Runyan Yang, Han Zhu, Pengyuan Zhang, Yonghong Yan 0002, 
Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Text Data.
ICASSP2021
Zuozhen Liu, Ta Li, Pengyuan Zhang, 
RNN-T Based Open-Vocabulary Keyword Spotting in Mandarin with Multi-Level Detection.
Interspeech2021
Ziyi Chen, Pengyuan Zhang, 
TVQVC: Transformer Based Vector Quantized Variational Autoencoder with CTC Loss for Voice Conversion.
Interspeech2021
Feng Dang, Pengyuan Zhang, Hangting Chen, 
Improved Speech Enhancement Using a Complex-Domain GAN with Fused Time-Domain and Time-Frequency Domain Constraints.
ICASSP2022
Zvi Kons, Aharon Satt, Hong-Kwang Kuo, Samuel Thomas 0001, Boaz Carmeli, Ron Hoory, Brian Kingsbury, 
A New Data Augmentation Method for Intent Classification Enhancement and its Application on Spoken Conversation Datasets.
ICASSP2022
Hong-Kwang Jeff Kuo, Zoltán Tüske, Samuel Thomas 0001, Brian Kingsbury, George Saon, 
Improving End-to-end Models for Set Prediction in Spoken Language Understanding.
ICASSP2022
Vishal Sunder, Samuel Thomas 0001, Hong-Kwang Jeff Kuo, Jatin Ganhotra, Brian Kingsbury, Eric Fosler-Lussier, 
Towards End-to-End Integration of Dialog History for Improved Spoken Language Understanding.
ICASSP2022
Samuel Thomas 0001, Hong-Kwang Jeff Kuo, Brian Kingsbury, George Saon, 
Towards Reducing the Need for Speech Training Data to Build Spoken Language Understanding Systems.
ICASSP2022
Samuel Thomas 0001, Brian Kingsbury, George Saon, Hong-Kwang Jeff Kuo, 
Integrating Text Inputs for Training and Adapting RNN Transducer ASR Models.
Interspeech2022
Takashi Fukuda, Samuel Thomas 0001, Masayuki Suzuki, Gakuto Kurata, George Saon, Brian Kingsbury, 
Global RNN Transducer Models For Multi-dialect Speech Recognition.
Interspeech2022
Zvi Kons, Hagai Aronowitz, Edmilson da Silva Morais, Matheus Damasceno, Hong-Kwang Kuo, Samuel Thomas 0001, George Saon, 
Extending RNN-T-based speech recognition systems with emotion and language classification.
Interspeech2022
Vishal Sunder, Eric Fosler-Lussier, Samuel Thomas 0001, Hong-Kwang Kuo, Brian Kingsbury, 
Tokenwise Contrastive Pretraining for Finer Speech-to-BERT Alignment in End-to-End Speech-to-Intent Systems.
TASLP2021
Leda Sari, Mark Hasegawa-Johnson, Samuel Thomas 0001, 
Auxiliary Networks for Joint Speaker Adaptation and Speaker Change Detection.
ICASSP2021
Samuel Thomas 0001, Hong-Kwang Jeff Kuo, George Saon, Zoltán Tüske, Brian Kingsbury, Gakuto Kurata, Zvi Kons, Ron Hoory, 
RNN Transducer Models for Spoken Language Understanding.
ICASSP2021
Edmilson da Silva Morais, Hong-Kwang Jeff Kuo, Samuel Thomas 0001, Zoltán Tüske, Brian Kingsbury, 
End-to-End Spoken Language Understanding Using Transformer Networks and Self-Supervised Pre-Trained Features.
Interspeech2021
Sujeong Cha, Wangrui Hou, Hyun Jung, My Phung, Michael Picheny, Hong-Kwang Jeff Kuo, Samuel Thomas 0001, Edmilson da Silva Morais, 
Speak or Chat with Me: End-to-End Spoken Language Understanding System with Flexible Inputs.
Interspeech2021
Takashi Fukuda, Samuel Thomas 0001, 
Knowledge Distillation Based Training of Universal ASR Source Models for Cross-Lingual Transfer.
Interspeech2021
Jatin Ganhotra, Samuel Thomas 0001, Hong-Kwang Jeff Kuo, Sachindra Joshi, George Saon, Zoltán Tüske, Brian Kingsbury, 
Integrating Dialog History into End-to-End Spoken Language Understanding Systems.
Interspeech2021
Andrew Rouditchenko, Angie W. Boggust, David Harwath, Samuel Thomas 0001, Hilde Kuehne, Brian Chen, Rameswar Panda, Rogério Feris, Brian Kingsbury, Michael Picheny, James R. Glass, 
Cascaded Multilingual Audio-Visual Learning from Videos.
Interspeech2021
Andrew Rouditchenko, Angie W. Boggust, David Harwath, Brian Chen, Dhiraj Joshi, Samuel Thomas 0001, Kartik Audhkhasi, Hilde Kuehne, Rameswar Panda, Rogério Schmidt Feris, Brian Kingsbury, Michael Picheny, Antonio Torralba 0001, James R. Glass, 
AVLnet: Learning Audio-Visual Language Representations from Instructional Videos.
ICASSP2020
Yinghui Huang, Hong-Kwang Kuo, Samuel Thomas 0001, Zvi Kons, Kartik Audhkhasi, Brian Kingsbury, Ron Hoory, Michael Picheny, 
Leveraging Unpaired Text Data for Training End-To-End Speech-to-Intent Systems.
ICASSP2020
Alexandros Koumparoulis, Gerasimos Potamianos, Samuel Thomas 0001, Edmilson da Silva Morais, 
Audio-Assisted Image Inpainting for Talking Faces.
Interspeech2020
Samuel Thomas 0001, Kartik Audhkhasi, Brian Kingsbury, 
Transliteration Based Data Augmentation for Training Multilingual ASR Acoustic Models in Low Resource Settings.
Interspeech2020
Takashi Fukuda, Samuel Thomas 0001, 
Implicit Transfer of Privileged Acoustic Information in a Generalized Knowledge Distillation Framework.
TASLP2022
Zhengjun Yue, Erfan Loweimi, Heidi Christensen, Jon Barker, Zoran Cvetkovic, 
Acoustic Modelling From Raw Source and Filter Components for Dysarthric Speech Recognition.
ICASSP2022
Jack Deadman, Jon Barker, 
Improved Simulation of Realistically-Spatialised Simultaneous Speech Using Multi-Camera Analysis in The Chime-5 Dataset.
ICASSP2022
Zehai Tu, Jack Deadman, Ning Ma 0002, Jon Barker, 
Auditory-Based Data Augmentation for end-to-end Automatic Speech Recognition.
Interspeech2022
Jon Barker, Michael Akeroyd, Trevor J. Cox, John F. Culling, Jennifer Firth, Simone Graetzer, Holly Griffiths, Lara Harris, Graham Naylor, Zuzanna Podwinska, Eszter Porter, Rhoddy Viveros Muñoz, 
The 1st Clarity Prediction Challenge: A machine learning challenge for hearing aid intelligibility prediction.
Interspeech2022
Jack Deadman, Jon Barker, 
Modelling Turn-taking in Multispeaker Parties for Realistic Data Simulation.
Interspeech2022
Zehai Tu, Ning Ma 0002, Jon Barker, 
Exploiting Hidden Representations from a DNN-based Speech Recogniser for Speech Intelligibility Prediction in Hearing-impaired Listeners.
Interspeech2022
Zehai Tu, Ning Ma 0002, Jon Barker, 
Unsupervised Uncertainty Measures of Automatic Speech Recognition for Non-intrusive Speech Intelligibility Prediction.
Interspeech2022
Zhengjun Yue, Erfan Loweimi, Heidi Christensen, Jon Barker, Zoran Cvetkovic, 
Dysarthric Speech Recognition From Raw Waveform with Parametric CNNs.
Interspeech2022
Jisi Zhang, Catalin Zorila, Rama Doddipatla, Jon Barker, 
On monoaural speech enhancement for automatic recognition of real noisy speech using mixture invariant training.
ICASSP2021
Gerardo Roa Dabike, Jon Barker, 
The use of Voice Source Features for Sung Speech Recognition.
ICASSP2021
Zehai Tu, Ning Ma 0002, Jon Barker, 
DHASP: Differentiable Hearing Aid Speech Processing.
ICASSP2021
Jisi Zhang, Catalin Zorila, Rama Doddipatla, Jon Barker, 
Time-Domain Speech Extraction with Spatial Information and Multi Speaker Conditioning Mechanism.
Interspeech2021
Simone Graetzer, Jon Barker, Trevor J. Cox, Michael Akeroyd, John F. Culling, Graham Naylor, Eszter Porter, Rhoddy Viveros Muñoz, 
Clarity-2021 Challenges: Machine Learning Challenges for Advancing Hearing Aid Processing.
Interspeech2021
Zehai Tu, Ning Ma 0002, Jon Barker, 
Optimising Hearing Aid Fittings for Speech in Noise with a Differentiable Hearing Loss Model.
Interspeech2021
Zhengjun Yue, Jon Barker, Heidi Christensen, Cristina McKean, Elaine Ashton, Yvonne Wren, Swapnil Gadgil, Rebecca Bright, 
Parental Spoken Scaffolding and Narrative Skills in Crowd-Sourced Storytelling Samples of Young Children.
Interspeech2021
Jisi Zhang, Catalin Zorila, Rama Doddipatla, Jon Barker, 
Teacher-Student MixIT for Unsupervised and Semi-Supervised Speech Separation.
ICASSP2020
Feifei Xiong, Jon Barker, Zhengjun Yue, Heidi Christensen, 
Source Domain Data Selection for Improved Transfer Learning Targeting Dysarthric Speech Recognition.
ICASSP2020
Zhengjun Yue, Feifei Xiong, Heidi Christensen, Jon Barker, 
Exploring Appropriate Acoustic and Language Modelling Choices for Continuous Dysarthric Speech Recognition.
ICASSP2020
Jisi Zhang, Catalin Zorila, Rama Doddipatla, Jon Barker, 
On End-to-end Multi-channel Time Domain Speech Separation in Reverberant Environments.
Interspeech2020
Jack Deadman, Jon Barker, 
Simulating Realistically-Spatialised Simultaneous Speech Using Video-Driven Speaker Detection and the CHiME-5 Dataset.
ICASSP2022
Chanho Park, Rehan Ahmad, Thomas Hain, 
Unsupervised Data Selection for Speech Recognition with Contrastive Loss Ratios.
ICASSP2022
Jose Antonio Lopez Saenz, Thomas Hain, 
A Model for Assessor Bias in Automatic Pronunciation Assessment.
Interspeech2022
George Close, Samuel Hollands, Stefan Goetze, Thomas Hain, 
Non-intrusive Speech Intelligibility Metric Prediction for Hearing Impaired Individuals.
Interspeech2022
Muhammad Umar Farooq, Thomas Hain, 
Investigating the Impact of Crosslingual Acoustic-Phonetic Similarities on Multilingual Speech Recognition.
Interspeech2022
Muhammad Umar Farooq, Darshan Adiga Haniya Narayana, Thomas Hain, 
Non-Linear Pairwise Language Mappings for Low-Resource Multilingual Acoustic Model Fusion.
ICASSP2021
Qiang Huang 0008, Thomas Hain, 
Improving Audio Anomalies Recognition Using Temporal Convolutional Attention Networks.
ICASSP2021
Cong-Thanh Do, Rama Doddipatla, Thomas Hain, 
Multiple-Hypothesis CTC-Based Semi-Supervised Adaptation of End-to-End Speech Recognition.
Interspeech2021
Anna Ollerenshaw, Md. Asif Jalal, Thomas Hain, 
Insights on Neural Representations for End-to-End Speech Recognition.
Interspeech2020
Qiang Huang 0008, Thomas Hain, 
Exploration of Audio Quality Assessment and Anomaly Localisation Using Attention Models.
Interspeech2020
Mingjie Chen, Thomas Hain, 
Unsupervised Acoustic Unit Representation Learning for Voice Conversion Using WaveNet Auto-Encoders.
Interspeech2020
Md. Asif Jalal, Rosanna Milner, Thomas Hain, 
Empirical Interpretation of Speech Emotion Perception with Attention Based Model for Speech Emotion Recognition.
Interspeech2020
Md Asif Jalal, Rosanna Milner, Thomas Hain, Roger K. Moore, 
Removing Bias with Residual Mixture of Multi-View Attention for Speech Emotion Recognition.
Interspeech2020
Hardik B. Sailor, Thomas Hain, 
Multilingual Speech Recognition Using Language-Specific Phoneme Recognition as Auxiliary Task for Indian Languages.
Interspeech2020
Yanpei Shi, Qiang Huang 0008, Thomas Hain, 
Speaker Re-Identification with Speaker Dependent Speech Enhancement.
Interspeech2020
Yanpei Shi, Qiang Huang 0008, Thomas Hain, 
Weakly Supervised Training of Hierarchical Attention Networks for Speaker Identification.
Interspeech2020
Lukas Stappen, Georgios Rizos, Madina Hasan, Thomas Hain, Björn W. Schuller, 
Uncertainty-Aware Machine Support for Paper Reviewing on the Interspeech 2019 Submission Corpus.
Interspeech2019
Mortaza Doulaty, Thomas Hain, 
Latent Dirichlet Allocation Based Acoustic Data Selection for Automatic Speech Recognition.
Interspeech2019
Qiang Huang 0008, Thomas Hain, 
Detecting Mismatch Between Speech and Transcription Using Cross-Modal Attention.
Interspeech2019
Md Asif Jalal, Erfan Loweimi, Roger K. Moore, Thomas Hain, 
Learning Temporal Clusters Using Capsule Routing for Speech Emotion Recognition.
Interspeech2018
Erfan Loweimi, Jon Barker, Thomas Hain, 
On the Usefulness of the Speech Phase Spectrum for Pitch Extraction.
SpeechComm2022
Lauri Tavi, Tomi Kinnunen, Rosa González Hautamäki, 
Improving speaker de-identification with functional data analysis of f0 trajectories.
TASLP2022
Anssi Kanervisto, Ville Hautamäki, Tomi Kinnunen, Junichi Yamagishi, 
Optimizing Tandem Speaker Verification and Anti-Spoofing Systems.
ICASSP2022
Xuechen Liu, Md. Sahidullah, Tomi Kinnunen, 
Learnable Nonlinear Compression for Robust Speaker Verification.
Interspeech2022
Jee-weon Jung, Hemlata Tak, Hye-jin Shim, Hee-Soo Heo, Bong-Jin Lee, Soo-Whan Chung, Ha-Jin Yu, Nicholas W. D. Evans, Tomi Kinnunen, 
SASV 2022: The First Spoofing-Aware Speaker Verification Challenge.
Interspeech2021
Bhusan Chettri, Rosa González Hautamäki, Md. Sahidullah, Tomi Kinnunen, 
Data Quality as Predictor of Voice Anti-Spoofing Generalization.
Interspeech2021
Tomi Kinnunen, Andreas Nautsch, Md. Sahidullah, Nicholas W. D. Evans, Xin Wang 0037, Massimiliano Todisco, Héctor Delgado, Junichi Yamagishi, Kong Aik Lee, 
Visualizing Classifier Adjacency Relations: A Case Study in Speaker Verification and Voice Anti-Spoofing.
TASLP2020
Tomi Kinnunen, Héctor Delgado, Nicholas W. D. Evans, Kong Aik Lee, Ville Vestman, Andreas Nautsch, Massimiliano Todisco, Xin Wang 0037, Md. Sahidullah, Junichi Yamagishi, Douglas A. Reynolds, 
Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification: Fundamentals.
Interspeech2020
Rohan Kumar Das, Xiaohai Tian, Tomi Kinnunen, Haizhou Li 0001, 
The Attacker's Perspective on Automatic Speaker Verification: An Overview.
Interspeech2020
Rosa González Hautamäki, Tomi Kinnunen, 
Why Did the x-Vector System Miss a Target Speaker? Impact of Acoustic Mismatch Upon Target Score on VoxCeleb Data.
Interspeech2020
Xuechen Liu, Md. Sahidullah, Tomi Kinnunen, 
A Comparative Re-Assessment of Feature Extractors for Deep Speaker Embeddings.
Interspeech2020
Alexey Sholokhov, Tomi Kinnunen, Ville Vestman, Kong Aik Lee, 
Extrapolating False Alarm Rates in Automatic Speaker Verification.
TASLP2019
Akihiro Kato, Tomi H. Kinnunen, 
Statistical Regression Models for Noise Robust F0 Estimation Using Recurrent Deep Neural Networks.
ICASSP2019
Tomi Kinnunen, Rosa González Hautamäki, Ville Vestman, Md. Sahidullah, 
Can We Use Speaker Recognition Technology to Attack Itself? Enhancing Mimicry Attacks Using Automatic Target Speaker Selection.
ICASSP2019
Ville Vestman, Bilal Soomro, Anssi Kanervisto, Ville Hautamäki, Tomi Kinnunen, 
Who Do I Sound like? Showcasing Speaker Recognition Technology by Youtube Voice Search.
Interspeech2019
Kong Aik Lee, Ville Hautamäki, Tomi H. Kinnunen, Hitoshi Yamamoto, Koji Okabe, Ville Vestman, Jing Huang 0019, Guohong Ding, Hanwu Sun, Anthony Larcher, Rohan Kumar Das, Haizhou Li 0001, Mickael Rouvier, Pierre-Michel Bousquet, Wei Rao, Qing Wang 0039, Chunlei Zhang, Fahimeh Bahmaninezhad, Héctor Delgado, Massimiliano Todisco, 
I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences.
Interspeech2019
Massimiliano Todisco, Xin Wang 0037, Ville Vestman, Md. Sahidullah, Héctor Delgado, Andreas Nautsch, Junichi Yamagishi, Nicholas W. D. Evans, Tomi H. Kinnunen, Kong Aik Lee, 
ASVspoof 2019: Future Horizons in Spoofed and Fake Audio Detection.
Interspeech2019
Ville Vestman, Kong Aik Lee, Tomi H. Kinnunen, Takafumi Koshinaka, 
Unleashing the Unused Potential of i-Vectors Enabled by GPU Acceleration.
SpeechComm2018
Rosa González Hautamäki, Md. Sahidullah, Ville Hautamäki, Tomi Kinnunen, 
Acoustical and perceptual study of voice disguise by age modification in speaker verification.
SpeechComm2018
Ville Vestman, Dhananjaya N. Gowda, Md. Sahidullah, Paavo Alku, Tomi Kinnunen, 
Speaker recognition from whispered speech: A tutorial survey and an application of time-varying linear prediction.
Interspeech2018
Akihiro Kato, Tomi Kinnunen, 
Waveform to Single Sinusoid Regression to Estimate the F0 Contour from Noisy Speech Using Recurrent Deep Neural Networks.
ICASSP2022
Yizheng Huang, Nana Hou, Nancy F. Chen, 
Progressive Continual Learning for Spoken Keyword Spotting.
Interspeech2022
Perry Lam, Huayun Zhang, Nancy F. Chen, Berrak Sisman, 
EPIC TTS Models: Empirical Pruning Investigations Characterizing Text-To-Speech Models.
Interspeech2022
Zhengyuan Liu, Nancy F. Chen, 
Dynamic Sliding Window Modeling for Abstractive Meeting Summarization.
Interspeech2022
Jeremy Heng Meng Wong, Huayun Zhang, Nancy F. Chen, 
Variations of multi-task learning for spoken language assessment.
TASLP2021
Minh Nguyen 0002, Gia H. Ngo, Nancy F. Chen, 
Domain-Shift Conditioning Using Adaptable Filtering Via Hierarchical Embeddings for Robust Chinese Spell Check.
ICASSP2021
Richeng Duan, Nancy F. Chen, 
Senone-Aware Adversarial Multi-Task Training for Unsupervised Child to Adult Speech Adaptation.
Interspeech2021
Ke Shi 0001, Kye Min Tan, Huayun Zhang, Siti Umairah Md. Salleh, Shikang Ni, Nancy F. Chen, 
WittyKiddy: Multilingual Spoken Language Learning for Kids.
Interspeech2021
Huayun Zhang, Ke Shi 0001, Nancy F. Chen, 
Multilingual Speech Evaluation: Case Studies on English, Malay and Tamil.
Interspeech2020
Richeng Duan, Nancy F. Chen, 
Unsupervised Feature Adaptation Using Adversarial Multi-Task Training for Automatic Evaluation of Children's Speech.
Interspeech2020
Yuling Gu, Nancy F. Chen, 
Characterization of Singaporean Children's English: Comparisons to American and British Counterparts Using Archetypal Analysis.
Interspeech2020
Ke Shi 0001, Kye Min Tan, Richeng Duan, Siti Umairah Md. Salleh, Nur Farah Ain Suhaimi, Rajan Vellu, Ngoc Thuy Huong Helen Thai, Nancy F. Chen, 
Computer-Assisted Language Learning System: Automatic Speech Evaluation for Children Learning Malay and Tamil.
TASLP2019
Wei Li 0119, Nancy F. Chen, Sabato Marco Siniscalchi, Chin-Hui Lee, 
Improving Mispronunciation Detection of Mandarin Tones for Non-Native Learners With Soft-Target Tone Labels and BLSTM-Based Deep Tone Models.
TASLP2019
Hoang Gia Ngo, Minh Nguyen 0002, Nancy F. Chen, 
Phonology-Augmented Statistical Framework for Machine Transliteration Using Limited Linguistic Resources.
ACL2019
Zhengyuan Liu, Nancy F. Chen, 
Reading Turn by Turn: Hierarchical Attention Architecture for Spoken Dialogue Comprehension.
SpeechComm2018
Van Tung Pham, Haihua Xu, Xiong Xiao, Nancy F. Chen, Eng Siong Chng, Haizhou Li 0001, 
Re-ranking spoken term detection with acoustic exemplars of keywords.
TASLP2018
Van Hai Do, Nancy F. Chen, Boon Pang Lim, Mark A. Hasegawa-Johnson, 
Multitask Learning for Phone Recognition of Underresourced Languages Using Mismatched Transcription.
ICASSP2018
Wenda Chen, Mark Hasegawa-Johnson, Nancy F. Chen, 
Recognizing Zero-Resourced Languages Based on Mismatched Machine Transcriptions.
ICASSP2018
Wei Li 0119, Nancy F. Chen, Sabato Marco Siniscalchi, Chin-Hui Lee, 
Improving Mandarin Tone Mispronunciation Detection for Non-Native Learners with Soft-Target Tone Labels and BLSTM-Based Deep Models.
Interspeech2018
Wenda Chen, Mark Hasegawa-Johnson, Nancy F. Chen, 
Topic and Keyword Identification for Low-resourced Speech Using Cross-Language Transfer Learning.
EMNLP2018
Minh Nguyen 0002, Hoang Gia Ngo, Nancy F. Chen, 
Multimodal neural pronunciation modeling for spoken languages with logographic origin.
TASLP2022
Xiaochun An, Frank K. Soong, Lei Xie 0001, 
Disentangling Style and Speaker Attributes for TTS Style Transfer.
TASLP2022
Liumeng Xue, Frank K. Soong, Shaofei Zhang, Lei Xie 0001, 
ParaTTS: Learning Linguistic and Prosodic Cross-Sentence Information in Paragraph-Based TTS.
ICASSP2022
Shaoguang Mao, Frank K. Soong, Yan Xia 0005, Jonathan Tien, 
A Universal Ordinal Regression for Assessing Phoneme-Level Pronunciation.
ICASSP2022
Wenxuan Ye, Shaoguang Mao, Frank K. Soong, Wenshan Wu, Yan Xia 0005, Jonathan Tien, Zhiyong Wu 0001, 
An Approach to Mispronunciation Detection and Diagnosis with Acoustic, Phonetic and Linguistic (APL) Embeddings.
Interspeech2022
Mutian He 0001, Jingzhou Yang, Lei He 0005, Frank K. Soong, 
Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge.
Interspeech2022
Haohan Guo, Feng-Long Xie, Frank K. Soong, Xixin Wu, Helen Meng, 
A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS.
ICASSP2021
Yichong Leng, Xu Tan 0003, Sheng Zhao, Frank K. Soong, Xiang-Yang Li 0001, Tao Qin, 
MBNET: MOS Prediction for Synthesized Speech with Mean-Bias Network.
ICASSP2021
Bin Su, Shaoguang Mao, Frank K. Soong, Yan Xia 0005, Jonathan Tien, Zhiyong Wu 0001, 
Improving Pronunciation Assessment Via Ordinal Regression with Anchored Reference Samples.
ICASSP2021
Feng-Long Xie, Xinhui Li, Wen-Chao Su, Li Lu, Frank K. Soong, 
A New High Quality Trajectory Tiling Based Hybrid TTS In Real Time.
Interspeech2021
Xiaochun An, Frank K. Soong, Lei Xie 0001, 
Improving Performance of Seen and Unseen Speech Style Transfer in End-to-End Neural TTS.
ICASSP2020
Min-Jae Hwang, Eunwoo Song, Ryuichi Yamamoto, Frank K. Soong, Hong-Goo Kang, 
Improving LPCNET-Based Text-to-Speech with Linear Prediction-Structured Mixture Density Network.
ICASSP2020
Yujia Xiao, Lei He 0005, Huaiping Ming, Frank K. Soong, 
Improving Prosody with Linguistic and Bert Derived Features in Multi-Speaker Based Mandarin Chinese Neural TTS.
ICASSP2020
Feng-Long Xie, Xinhui Li, Bo Liu, Yibin Zheng, Li Meng, Li Lu, Frank K. Soong, 
An Improved Frame-Unit-Selection Based Voice Conversion System Without Parallel Training Data.
Interspeech2020
Yang Cui, Xi Wang 0016, Lei He 0005, Frank K. Soong, 
An Efficient Subband Linear Prediction for LPCNet-Based Neural Synthesis.
Interspeech2020
Yuanbo Hou, Frank K. Soong, Jian Luan 0001, Shengchen Li, 
Transfer Learning for Improving Singing-Voice Detection in Polyphonic Instrumental Music.
SpeechComm2019
Feng-Long Xie, Frank K. Soong, Haifeng Li 0001, 
Voice conversion with SI-DNN and KL divergence based mapping without parallel training data.
ICASSP2019
Jingyong Hou, Pengcheng Guo, Sining Sun, Frank K. Soong, Wenping Hu, Lei Xie 0001, 
Domain Adversarial Training for Improving Keyword Spotting Performance of ESL Speech.
ICASSP2019
Shaoguang Mao, Zhiyong Wu 0001, Jingshuai Jiang, Peiyun Liu, Frank K. Soong, 
NN-based Ordinal Regression for Assessing Fluency of ESL Speech.
ICASSP2019
Ke Wang, Frank K. Soong, Lei Xie 0001, 
A Pitch-aware Approach to Single-channel Speech Separation.
Interspeech2019
Haohan Guo, Frank K. Soong, Lei He 0005, Lei Xie 0001, 
A New GAN-Based End-to-End TTS Training Algorithm.
ICASSP2022
Thomas Bohnstingl, Ayush Garg 0006, Stanislaw Wozniak, George Saon, Evangelos Eleftheriou, Angeliki Pantazi, 
Speech Recognition Using Biologically-Inspired Neural Networks.
ICASSP2022
Hong-Kwang Jeff Kuo, Zoltán Tüske, Samuel Thomas 0001, Brian Kingsbury, George Saon, 
Improving End-to-end Models for Set Prediction in Spoken Language Understanding.
ICASSP2022
Samuel Thomas 0001, Hong-Kwang Jeff Kuo, Brian Kingsbury, George Saon, 
Towards Reducing the Need for Speech Training Data to Build Spoken Language Understanding Systems.
ICASSP2022
Samuel Thomas 0001, Brian Kingsbury, George Saon, Hong-Kwang Jeff Kuo, 
Integrating Text Inputs for Training and Adapting RNN Transducer ASR Models.
Interspeech2022
Xiaodong Cui, George Saon, Tohru Nagano, Masayuki Suzuki, Takashi Fukuda, Brian Kingsbury, Gakuto Kurata, 
Improving Generalization of Deep Neural Network Acoustic Models with Length Perturbation and N-best Based Label Smoothing.
Interspeech2022
Andrea Fasoli, Chia-Yu Chen, Mauricio J. Serrano, Swagath Venkataramani, George Saon, Xiaodong Cui, Brian Kingsbury, Kailash Gopalakrishnan, 
Accelerating Inference and Language Model Fusion of Recurrent Neural Network Transducers via End-to-End 4-bit Quantization.
Interspeech2022
Takashi Fukuda, Samuel Thomas 0001, Masayuki Suzuki, Gakuto Kurata, George Saon, Brian Kingsbury, 
Global RNN Transducer Models For Multi-dialect Speech Recognition.
Interspeech2022
Zvi Kons, Hagai Aronowitz, Edmilson da Silva Morais, Matheus Damasceno, Hong-Kwang Kuo, Samuel Thomas 0001, George Saon, 
Extending RNN-T-based speech recognition systems with emotion and language classification.
Interspeech2022
Jiatong Shi, George Saon, David Haws, Shinji Watanabe 0001, Brian Kingsbury, 
VQ-T: RNN Transducers using Vector-Quantized Prediction Network States.
Interspeech2022
Takuma Udagawa, Masayuki Suzuki, Gakuto Kurata, Nobuyasu Itoh, George Saon, 
Effect and Analysis of Large-scale Language Model Rescoring on Competitive ASR Systems.
ICASSP2021
Samuel Thomas 0001, Hong-Kwang Jeff Kuo, George Saon, Zoltán Tüske, Brian Kingsbury, Gakuto Kurata, Zvi Kons, Ron Hoory, 
RNN Transducer Models for Spoken Language Understanding.
ICASSP2021
George Saon, Zoltán Tüske, Daniel Bolaños, Brian Kingsbury, 
Advancing RNN Transducer Technology for Speech Recognition.
Interspeech2021
Xiaodong Cui, Brian Kingsbury, George Saon, David Haws, Zoltán Tüske, 
Reducing Exposure Bias in Training Recurrent Neural Network Transducers.
Interspeech2021
Andrea Fasoli, Chia-Yu Chen, Mauricio J. Serrano, Xiao Sun, Naigang Wang, Swagath Venkataramani, George Saon, Xiaodong Cui, Brian Kingsbury, Wei Zhang 0022, Zoltán Tüske, Kailash Gopalakrishnan, 
4-Bit Quantization of LSTM-Based Speech Recognition Models.
Interspeech2021
Jatin Ganhotra, Samuel Thomas 0001, Hong-Kwang Jeff Kuo, Sachindra Joshi, George Saon, Zoltán Tüske, Brian Kingsbury, 
Integrating Dialog History into End-to-End Spoken Language Understanding Systems.
Interspeech2021
Gakuto Kurata, George Saon, Brian Kingsbury, David Haws, Zoltán Tüske, 
Improving Customization of Neural Transducers by Mitigating Acoustic Mismatch of Synthesized Audio.
Interspeech2021
Zoltán Tüske, George Saon, Brian Kingsbury, 
On the Limit of English Conversational Speech Recognition.
Interspeech2020
Gakuto Kurata, George Saon, 
Knowledge Distillation from Offline to Streaming RNN Transducer for End-to-End Speech Recognition.
Interspeech2020
Zoltán Tüske, George Saon, Kartik Audhkhasi, Brian Kingsbury, 
Single Headed Attention Based Sequence-to-Sequence Model for State-of-the-Art Results on Switchboard.
ICASSP2019
George Saon, Zoltán Tüske, Kartik Audhkhasi, Brian Kingsbury, 
Sequence Noise Injected Training for End-to-end Speech Recognition.
ICASSP2022
Hemant A. Patil, Ankur T. Patil, Aastha Kachhi, 
Constant Q Cepstral coefficients for classification of normal vs. Pathological infant cry.
SpeechComm2021
Madhu R. Kamble, Hemlata Tak, Hemant A. Patil, 
Amplitude and Frequency Modulation-based features for detection of replay Spoof Speech.
SpeechComm2021
Meet H. Soni, Hemant A. Patil, 
Non-intrusive quality assessment of noise-suppressed speech using unsupervised deep features.
Interspeech2021
Gauri P. Prajapati, Dipesh K. Singh, Preet P. Amin, Hemant A. Patil, 
Voice Privacy Through x-Vector and CycleGAN-Based Anonymization.
ICASSP2020
Harshit Malaviya, Jui Shah, Maitreya Patel, Jalansh Munshi, Hemant A. Patil, 
Mspec-Net : Multi-Domain Speech Conversion Network.
ICASSP2019
Madhu R. Kamble, Hemant A. Patil, 
Analysis of Reverberation via Teager Energy Features for Replay Spoof Speech Detection.
ICASSP2019
Nirmesh J. Shah, Hemant A. Patil, 
Novel Metric Learning for Non-parallel Voice Conversion.
Interspeech2019
Ankur T. Patil, Rajul Acharya, Pulikonda Krishna Aditya Sai, Hemant A. Patil, 
Energy Separation-Based Instantaneous Frequency Estimation for Cochlear Cepstral Feature for Replay Spoof Detection.
Interspeech2019
Nirmesh J. Shah, Hemant A. Patil, 
Phone Aware Nearest Neighbor Technique Using Spectral Transition Measure for Non-Parallel Voice Conversion.
Interspeech2019
Nirmesh J. Shah, Hardik B. Sailor, Hemant A. Patil, 
Whether to Pretrain DNN or not?: An Empirical Analysis for Voice Conversion.
Interspeech2018
Madhu R. Kamble, Hemant A. Patil, 
Novel Variable Length Energy Separation Algorithm Using Instantaneous Amplitude Features for Replay Detection.
Interspeech2018
Madhu R. Kamble, Hemlata Tak, Hemant A. Patil, 
Effectiveness of Speech Demodulation-Based Features for Replay Detection.
Interspeech2018
Hardik B. Sailor, Maddala Venkata Siva Krishna, Diksha Chhabra, Ankur T. Patil, Madhu R. Kamble, Hemant A. Patil, 
DA-IICT/IIITV System for Low Resource Speech Recognition Challenge 2018.
Interspeech2018
Hardik B. Sailor, Madhu R. Kamble, Hemant A. Patil, 
Auditory Filterbank Learning for Temporal Modulation Features in Replay Spoof Speech Detection.
Interspeech2018
Hardik B. Sailor, Hemant A. Patil, 
Auditory Filterbank Learning Using ConvRBM for Infant Cry Classification.
Interspeech2018
Nirmesh J. Shah, Maulik C. Madhavi, Hemant A. Patil, 
Unsupervised Vocal Tract Length Warped Posterior Features for Non-Parallel Voice Conversion.
Interspeech2018
Nirmesh J. Shah, Hemant A. Patil, 
Effectiveness of Dynamic Features in INCA and Temporal Context-INCA.
Interspeech2018
Neil Shah, Nirmesh J. Shah, Hemant A. Patil, 
Effectiveness of Generative Adversarial Network for Non-Audible Murmur-to-Whisper Speech Conversion.
Interspeech2018
Hemlata Tak, Hemant A. Patil, 
Novel Linear Frequency Residual Cepstral Features for Replay Attack Detection.
Interspeech2018
Prasad Tapkir, Hemant A. Patil, 
Novel Empirical Mode Decomposition Cepstral Features for Replay Spoof Detection.
SpeechComm2023
Jiangyan Yi, Jianhua Tao, Ye Bai, Zhengkun Tian, Cunhang Fan, 
Transfer knowledge for punctuation prediction via adversarial training.
TASLP2022
Tao Wang 0074, Ruibo Fu, Jiangyan Yi, Jianhua Tao, Zhengqi Wen, 
NeuralDPS: Neural Deterministic Plus Stochastic Model With Multiband Excitation for Noise-Controllable Waveform Generation.
TASLP2022
Tao Wang 0074, Jiangyan Yi, Ruibo Fu, Jianhua Tao, Zhengqi Wen, 
CampNet: Context-Aware Mask Prediction for End-to-End Text-Based Speech Editing.
ICASSP2022
Tao Wang 0074, Jiangyan Yi, Liqun Deng, Ruibo Fu, Jianhua Tao, Zhengqi Wen, 
Context-Aware Mask Prediction Network for End-to-End Text-Based Speech Editing.
Interspeech2022
Shuai Zhang 0014, Jiangyan Yi, Zhengkun Tian, Jianhua Tao, Yu Ting Yeung, Liqun Deng, 
reducing multilingual context confusion for end-to-end code-switching automatic speech recognition.
TASLP2021
Ye Bai, Jiangyan Yi, Jianhua Tao, Zhengkun Tian, Zhengqi Wen, Shuai Zhang 0014, 
Fast End-to-End Speech Recognition Via Non-Autoregressive Models and Cross-Modal Knowledge Transferring From BERT.
TASLP2021
Ye Bai, Jiangyan Yi, Jianhua Tao, Zhengqi Wen, Zhengkun Tian, Shuai Zhang 0014, 
Integrating Knowledge Into End-to-End Speech Recognition From External Text-Only Data.
TASLP2021
Cunhang Fan, Jiangyan Yi, Jianhua Tao, Zhengkun Tian, Bin Liu 0041, Zhengqi Wen, 
Gated Recurrent Fusion With Joint Training Framework for Robust End-to-End Speech Recognition.
ICASSP2021
Shuai Zhang 0014, Jiangyan Yi, Zhengkun Tian, Ye Bai, Jianhua Tao, Zhengqi Wen, 
Decoupling Pronunciation and Language for End-to-End Code-Switching Automatic Speech Recognition.
ICASSP2021
Ruibo Fu, Jianhua Tao, Zhengqi Wen, Jiangyan Yi, Tao Wang 0074, Chunyu Qiang, 
Bi-Level Style and Prosody Decoupling Modeling for Personalized End-to-End Speech Synthesis.
ICASSP2021
Tao Wang 0074, Ruibo Fu, Jiangyan Yi, Jianhua Tao, Zhengqi Wen, Chunyu Qiang, Shiming Wang, 
Prosody and Voice Factorization for Few-Shot Speaker Adaptation in the Challenge M2voc 2021.
Interspeech2021
Shuai Zhang 0014, Jiangyan Yi, Zhengkun Tian, Ye Bai, Jianhua Tao, Xuefei Liu, Zhengqi Wen, 
End-to-End Spelling Correction Conditioned on Acoustic Feature for Code-Switching Speech Recognition.
Interspeech2021
Haoxin Ma, Jiangyan Yi, Jianhua Tao, Ye Bai, Zhengkun Tian, Chenglong Wang, 
Continual Learning for Fake Audio Detection.
Interspeech2021
Zhengkun Tian, Jiangyan Yi, Ye Bai, Jianhua Tao, Shuai Zhang 0014, Zhengqi Wen, 
FSR: Accelerating the Inference Process of Transducer-Based Models by Applying Fast-Skip Regularization.
Interspeech2021
Jiangyan Yi, Ye Bai, Jianhua Tao, Haoxin Ma, Zhengkun Tian, Chenglong Wang, Tao Wang 0074, Ruibo Fu, 
Half-Truth: A Partially Fake Audio Detection Dataset.
ICASSP2020
Ruibo Fu, Jianhua Tao, Zhengqi Wen, Jiangyan Yi, Tao Wang 0074, 
Focusing on Attention: Prosody Transfer and Adaptative Optimization Strategy for Multi-Speaker End-to-End Speech Synthesis.
Interspeech2020
Ye Bai, Jiangyan Yi, Jianhua Tao, Zhengkun Tian, Zhengqi Wen, Shuai Zhang 0014, 
Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition.
Interspeech2020
Cunhang Fan, Jianhua Tao, Bin Liu 0041, Jiangyan Yi, Zhengqi Wen, 
Gated Recurrent Fusion of Spatial and Spectral Features for Multi-Channel Speech Separation with Deep Embedding Representations.
Interspeech2020
Cunhang Fan, Jianhua Tao, Bin Liu 0041, Jiangyan Yi, Zhengqi Wen, 
Joint Training for Simultaneous Speech Denoising and Dereverberation with Deep Embedding Representations.
Interspeech2020
Ruibo Fu, Jianhua Tao, Zhengqi Wen, Jiangyan Yi, Chunyu Qiang, Tao Wang 0074, 
Dynamic Soft Windowing and Language Dependent Style Token for Code-Switching End-to-End Speech Synthesis.
SpeechComm2023
Iván López-Espejo, Amin Edraki, Wai-Yip Chan, Zheng-Hua Tan, Jesper Jensen 0001, 
On the deficiency of intelligibility metrics as proxies for subjective intelligibility.
TASLP2022
Poul Hoang, Jan Mark de Haan, Zheng-Hua Tan, Jesper Jensen 0001, 
Multichannel Speech Enhancement With Own Voice-Based Interfering Speech Suppression for Hearing Assistive Devices.
ICASSP2022
Andreas Jonas Fuglsig, Jan Østergaard, Jesper Jensen 0001, Lars Søndergaard Bertelsen, Peter Mariager, Zheng-Hua Tan, 
Joint Far- and Near-End Speech Intelligibility Enhancement Based on the Approximated Speech Intelligibility Index.
TASLP2021
Amin Edraki, Wai-Yip Chan, Jesper Jensen 0001, Daniel Fogerty, 
Speech Intelligibility Prediction Using Spectro-Temporal Modulation Analysis.
TASLP2021
Iván López-Espejo, Zheng-Hua Tan, Jesper Jensen 0001, 
A Novel Loss Function and Training Strategy for Noise-Robust Keyword Spotting.
TASLP2021
Daniel Michelsanti, Zheng-Hua Tan, Shi-Xiong Zhang, Yong Xu 0004, Meng Yu 0003, Dong Yu 0001, Jesper Jensen 0001, 
An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation.
ICASSP2021
Giovanni Morrone, Daniel Michelsanti, Zheng-Hua Tan, Jesper Jensen 0001, 
Audio-Visual Speech Inpainting with Deep Learning.
Interspeech2021
Amin Edraki, Wai-Yip Chan, Jesper Jensen 0001, Daniel Fogerty, 
A Spectro-Temporal Glimpsing Index (STGI) for Speech Intelligibility Prediction.
SpeechComm2020
Daniel Michelsanti, Zheng-Hua Tan, Sigurdur Sigurdsson, Jesper Jensen 0001, 
Deep-learning-based audio-visual speech enhancement in presence of Lombard effect.
TASLP2020
Morten Kolbæk, Zheng-Hua Tan, Søren Holdt Jensen, Jesper Jensen 0001, 
On Loss Functions for Supervised Monaural Time-Domain Speech Enhancement.
TASLP2020
Juan M. Martín-Doñas, Jesper Jensen 0001, Zheng-Hua Tan, Angel M. Gomez, Antonio M. Peinado, 
Online Multichannel Speech Enhancement Based on Recursive EM and DNN-Based Speech Presence Estimation.
ICASSP2020
Poul Hoang, Zheng-Hua Tan, Thomas Lunner, Jan Mark de Haan, Jesper Jensen 0001, 
Maximum Likelihood Estimation of the Interference-Plus-Noise Cross Power Spectral Density Matrix for Own Voice Retrieval.
ICASSP2020
Mathias Bach Pedersen, Asger Heidemann Andersen, Søren Holdt Jensen, Jesper Jensen 0001, 
A Neural Network for Monaural Intrusive Speech Intelligibility Prediction.
Interspeech2020
Daniel Michelsanti, Olga Slizovskaia, Gloria Haro, Emilia Gómez, Zheng-Hua Tan, Jesper Jensen 0001, 
Vocoder-Based Speech Synthesis from Silent Videos.
Interspeech2020
Mathias Bach Pedersen, Morten Kolbæk, Asger Heidemann Andersen, Søren Holdt Jensen, Jesper Jensen 0001, 
End-to-End Speech Intelligibility Prediction Using Time-Domain Fully Convolutional Neural Networks.
TASLP2019
Mohsen Zareian Jahromi, Adel Zahedi, Jesper Jensen 0001, Jan Østergaard, 
Information Loss in the Human Auditory System.
TASLP2019
Morten Kolbaek, Zheng-Hua Tan, Jesper Jensen 0001, 
On the Relationship Between Short-Time Objective Intelligibility and Short-Time Spectral-Amplitude Mean-Square Error for Speech Enhancement.
ICASSP2019
Daniel Michelsanti, Zheng-Hua Tan, Sigurdur Sigurdsson, Jesper Jensen 0001, 
On Training Targets and Objective Functions for Deep-learning-based Audio-visual Speech Enhancement.
Interspeech2019
Amin Edraki, Wai-Yip Chan, Jesper Jensen 0001, Daniel Fogerty, 
Improvement and Assessment of Spectro-Temporal Modulation Analysis for Speech Intelligibility Estimation.
Interspeech2019
Iván López-Espejo, Zheng-Hua Tan, Jesper Jensen 0001, 
Keyword Spotting for Hearing Assistive Devices Robust to External Speakers.
Interspeech2022
Rong Chao, Cheng Yu, Szu-Wei Fu, Xugang Lu, Yu Tsao 0001, 
Perceptual Contrast Stretching on Target Feature for Speech Enhancement.
Interspeech2022
Kai Li 0018, Sheng Li 0010, Xugang Lu, Masato Akagi, Meng Liu, Lin Zhang, Chang Zeng, Longbiao Wang, Jianwu Dang 0001, Masashi Unoki, 
Data Augmentation Using McAdams-Coefficient-Based Speaker Anonymization for Fake Audio Detection.
Interspeech2022
Peng Shen, Xugang Lu, Hisashi Kawai, 
Transducer-based language embedding for spoken language identification.
TASLP2021
Xugang Lu, Peng Shen, Yu Tsao 0001, Hisashi Kawai, 
Coupling a Generative Model With a Discriminative Learning Framework for Speaker Verification.
ICASSP2021
Xugang Lu, Peng Shen, Yu Tsao 0001, Hisashi Kawai, 
Unsupervised Neural Adaptation Model Based on Optimal Transport for Spoken Language Identification.
Interspeech2021
Szu-Wei Fu, Cheng Yu, Tsun-An Hsieh, Peter Plantinga, Mirco Ravanelli, Xugang Lu, Yu Tsao 0001, 
MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement.
Interspeech2021
Tsun-An Hsieh, Cheng Yu, Szu-Wei Fu, Xugang Lu, Yu Tsao 0001, 
Improving Perceptual Quality by Phone-Fortified Perceptual Loss Using Wasserstein Distance for Speech Enhancement.
NeurIPS2021
Hsin-Yi Lin, Huan-Hsin Tseng, Xugang Lu, Yu Tsao 0001, 
Unsupervised Noise Adaptive Speech Enhancement by Discriminator-Constrained Optimal Transport.
TASLP2020
Peng Shen, Xugang Lu, Sheng Li 0010, Hisashi Kawai, 
Knowledge Distillation-Based Representation Learning for Short-Utterance Spoken Language Identification.
TASLP2020
Cheng Yu, Ryandhimas E. Zezario, Syu-Siang Wang, Jonathan Sherman, Yi-Yen Hsieh, Xugang Lu, Hsin-Min Wang, Yu Tsao 0001, 
Speech Enhancement Based on Denoising Autoencoder With Multi-Branched Encoders.
ICASSP2020
Ryandhimas E. Zezario, Tassadaq Hussain, Xugang Lu, Hsin-Min Wang, Yu Tsao 0001, 
Self-Supervised Denoising Autoencoder with Linear Regression Decoder for Speech Enhancement.
Interspeech2020
Yen-Ju Lu, Chien-Feng Liao, Xugang Lu, Jeih-weih Hung, Yu Tsao 0001, 
Incorporating Broad Phonetic Information for Speech Enhancement.
Interspeech2020
Peng Shen, Xugang Lu, Hisashi Kawai, 
Investigation of NICT Submission for Short-Duration Speaker Verification Challenge 2020.
ICASSP2019
Peng Shen, Xugang Lu, Sheng Li 0010, Hisashi Kawai, 
Interactive Learning of Teacher-student Model for Short Utterance Spoken Language Identification.
Interspeech2019
Sheng Li 0010, Chenchen Ding, Xugang Lu, Peng Shen, Tatsuya Kawahara, Hisashi Kawai, 
End-to-End Articulatory Attribute Modeling for Low-Resource Multilingual Speech Recognition.
Interspeech2019
Sheng Li 0010, Xugang Lu, Chenchen Ding, Peng Shen, Tatsuya Kawahara, Hisashi Kawai, 
Investigating Radical-Based End-to-End Speech Recognition Systems for Chinese Dialects and Japanese.
Interspeech2019
Sheng Li 0010, Raj Dabre, Xugang Lu, Peng Shen, Tatsuya Kawahara, Hisashi Kawai, 
Improving Transformer-Based Speech Recognition Systems with Compressed Structure and Speech Attributes Augmentation.
Interspeech2019
Chien-Feng Liao, Yu Tsao 0001, Xugang Lu, Hisashi Kawai, 
Incorporating Symbolic Sequential Modeling for Speech Enhancement.
Interspeech2019
Xugang Lu, Peng Shen, Sheng Li 0010, Yu Tsao 0001, Hisashi Kawai, 
Class-Wise Centroid Distance Metric Learning for Acoustic Event Detection.
Interspeech2019
Ryandhimas E. Zezario, Szu-Wei Fu, Xugang Lu, Hsin-Min Wang, Yu Tsao 0001, 
Specialized Speech Enhancement Model Selection Based on Learned Non-Intrusive Quality Assessment Metric.
SpeechComm2021
Tharshini Gunendradasan, Eliathamby Ambikairajah, Julien Epps, Vidhyasaharan Sethu, Haizhou Li 0001, 
An adaptive transmission line cochlear model based front-end for replay attack detection.
SpeechComm2021
Brian Stasak, Julien Epps, Heather T. Schatten, Ivan W. Miller, Emily Mower Provost, Michael F. Armey, 
Read speech voice quality and disfluency in individuals with recent suicidal ideation or suicide attempt.
ICASSP2021
Brian Stasak, Zhaocheng Huang, Dale Joachim, Julien Epps, 
Automatic Elicitation Compliance for Short-Duration Speech Based Depression Detection.
Interspeech2021
Beena Ahmed, Kirrie J. Ballard, Denis Burnham, Tharmakulasingam Sirojan, Hadi Mehmood, Dominique Estival, Elise Baker, Felicity Cox, Joanne Arciuli, Titia Benders, Katherine Demuth, Barbara Kelly, Chloé Diskin-Holdaway, Mostafa Ali Shahin, Vidhyasaharan Sethu, Julien Epps, Chwee Beng Lee, Eliathamby Ambikairajah, 
AusKidTalk: An Auditory-Visual Corpus of 3- to 12-Year-Old Australian Children's Speech.
SpeechComm2020
Brian Stasak, Julien Epps, Roland Goecke, 
Automatic depression classification based on affective read sentences: Opportunities for text-dependent analysis.
ICASSP2020
Zhaocheng Huang, Julien Epps, Dale Joachim, 
Exploiting Vocal Tract Coordination Using Dilated CNNS For Depression Detection In Naturalistic Environments.
Interspeech2020
Zhaocheng Huang, Julien Epps, Dale Joachim, Brian Stasak, James R. Williamson, Thomas F. Quatieri, 
Domain Adaptation for Enhancing Speech-Based Depression Detection in Natural Environmental Conditions Using Dilated CNNs.
Interspeech2020
Sadari Jayawardena, Julien Epps, Zhaocheng Huang, 
How Ordinal Are Your Data?
Interspeech2020
Hang Li, Siyuan Chen, Julien Epps, 
Augmenting Turn-Taking Prediction with Wearable Eye Activity During Conversation.
Interspeech2020
Prasanth Parasu, Julien Epps, Kaavya Sriskandaraja, Gajan Suthokumar, 
Investigating Light-ResNet Architecture for Spoofing Detection Under Mismatched Conditions.
Interspeech2020
Mostafa Ali Shahin, Renée Lu, Julien Epps, Beena Ahmed, 
UNSW System Description for the Shared Task on Automatic Speech Recognition for Non-Native Children's Speech.
ICASSP2019
Tharshini Gunendradasan, Saad Irtza, Eliathamby Ambikairajah, Julien Epps, 
Transmission Line Cochlear Model Based AM-FM Features for Replay Attack Detection.
ICASSP2019
Zhaocheng Huang, Julien Epps, Dale Joachim, 
Speech Landmark Bigrams for Depression Detection from Naturalistic Smartphone Speech.
ICASSP2019
Buddhi Wickramasinghe, Eliathamby Ambikairajah, Julien Epps, Vidhyasaharan Sethu, Haizhou Li 0001, 
Auditory Inspired Spatial Differentiation for Replay Spoofing Attack Detection.
Interspeech2019
Tharshini Gunendradasan, Eliathamby Ambikairajah, Julien Epps, Haizhou Li 0001, 
An Adaptive-Q Cochlear Model for Replay Spoofing Detection.
Interspeech2019
Siddique Latif, Rajib Rana, Sara Khalifa, Raja Jurdak, Julien Epps, 
Direct Modelling of Speech Emotion from Raw Speech.
Interspeech2019
Buddhi Wickramasinghe, Eliathamby Ambikairajah, Julien Epps, 
Biologically Inspired Adaptive-Q Filterbanks for Replay Spoofing Attack Detection.
Interspeech2018
Mia Atcheson, Vidhyasaharan Sethu, Julien Epps, 
Demonstrating and Modelling Systematic Time-varying Annotator Disagreement in Continuous Emotion Annotation.
Interspeech2018
Tharshini Gunendradasan, Buddhi Wickramasinghe, Phu Ngoc Le, Eliathamby Ambikairajah, Julien Epps, 
Detection of Replay-Spoofing Attacks Using Frequency Modulation Features.
Interspeech2018
Zhaocheng Huang, Julien Epps, Dale Joachim, Michael Chen, 
Depression Detection from Short Utterances via Diverse Smartphones in Natural Environmental Conditions.
ICASSP2022
Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Shinji Watanabe 0001, 
Integrating Multiple ASR Systems into NLP Backend with Attention Fusion.
Interspeech2022
Koharu Horii, Meiko Fukuda, Kengo Ohta, Ryota Nishimura, Atsunori Ogawa, Norihide Kitaoka, 
End-to-End Spontaneous Speech Recognition Using Disfluency Labeling.
ICASSP2021
Atsunori Ogawa, Naohiro Tawara, Takatomo Kano, Marc Delcroix, 
BLSTM-Based Confidence Estimation for End-to-End Speech Recognition.
Interspeech2021
Ayako Yamamoto, Toshio Irino, Kenichi Arai, Shoko Araki, Atsunori Ogawa, Keisuke Kinoshita, Tomohiro Nakatani, 
Comparison of Remote Experiments Using Crowdsourcing and Laboratory Experiments on Speech Intelligibility.
ICASSP2020
Naohiro Tawara, Hosana Kamiyama, Satoshi Kobashikawa, Atsunori Ogawa, 
Improving Speaker-Attribute Estimation by Voting Based on Speaker Cluster Information.
ICASSP2020
Naohiro Tawara, Atsunori Ogawa, Tomoharu Iwata, Marc Delcroix, Tetsuji Ogawa, 
Frame-Level Phoneme-Invariant Speaker Embedding for Text-Independent Speaker Recognition on Extremely Short Utterances.
Interspeech2020
Kenichi Arai, Shoko Araki, Atsunori Ogawa, Keisuke Kinoshita, Tomohiro Nakatani, Toshio Irino, 
Predicting Intelligibility of Enhanced Speech Using Posteriors Derived from DNN-Based ASR System.
Interspeech2020
Atsunori Ogawa, Naohiro Tawara, Marc Delcroix, 
Language Model Data Augmentation Based on Text Domain Transfer.
ICASSP2019
Michael Hentschel, Marc Delcroix, Atsunori Ogawa, Tomoharu Iwata, Tomohiro Nakatani, 
A Unified Framework for Feature-based Domain Adaptation of Neural Network Language Models.
ICASSP2019
Shigeki Karita, Shinji Watanabe 0001, Tomoharu Iwata, Marc Delcroix, Atsunori Ogawa, Tomohiro Nakatani, 
Semi-supervised End-to-end Speech Recognition Using Text-to-speech and Autoencoders.
ICASSP2019
Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Atsunori Ogawa, Tomohiro Nakatani, 
A Unified Framework for Neural Speech Separation and Extraction.
ICASSP2019
Atsunori Ogawa, Tsutomu Hirao, Tomohiro Nakatani, Masaaki Nagata, 
ILP-based Compressive Speech Summarization with Content Word Coverage Maximization and Its Oracle Performance Analysis.
Interspeech2019
Kenichi Arai, Shoko Araki, Atsunori Ogawa, Keisuke Kinoshita, Tomohiro Nakatani, Katsuhiko Yamamoto, Toshio Irino, 
Predicting Speech Intelligibility of Enhanced Speech Using Phone Accuracy of DNN-Based ASR System.
Interspeech2019
Marc Delcroix, Shinji Watanabe 0001, Tsubasa Ochiai, Keisuke Kinoshita, Shigeki Karita, Atsunori Ogawa, Tomohiro Nakatani, 
End-to-End SpeakerBeam for Single Channel Target Speech Recognition.
Interspeech2019
Shigeki Karita, Nelson Enrique Yalta Soplin, Shinji Watanabe 0001, Marc Delcroix, Atsunori Ogawa, Tomohiro Nakatani, 
Improving Transformer-Based End-to-End Speech Recognition with Connectionist Temporal Classification and Language Model Integration.
Interspeech2019
Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Atsunori Ogawa, Tomohiro Nakatani, 
Multimodal SpeakerBeam: Single Channel Target Speech Extraction with Audio-Visual Speaker Clues.
Interspeech2019
Atsunori Ogawa, Marc Delcroix, Shigeki Karita, Tomohiro Nakatani, 
Improved Deep Duel Model for Rescoring N-Best Speech Recognition List Using Backward LSTMLM and Ensemble Encoders.
TASLP2018
Marc Delcroix, Keisuke Kinoshita, Atsunori Ogawa, Christian Huemmer 0001, Tomohiro Nakatani, 
Context Adaptive Neural Network Based Acoustic Models for Rapid Adaptation.
ICASSP2018
Marc Delcroix, Katerina Zmolíková, Keisuke Kinoshita, Atsunori Ogawa, Tomohiro Nakatani, 
Single Channel Target Speaker Extraction and Recognition with Speaker Beam.
ICASSP2018
Tsuyoshi Morioka, Naohiro Tawara, Tetsuji Ogawa, Atsunori Ogawa, Tomoharu Iwata, Tetsunori Kobayashi, 
Language Model Domain Adaptation Via Recurrent Neural Networks with Domain-Shared and Domain-Specific Representations.
ICASSP2022
Jee-weon Jung, Hee-Soo Heo, Hemlata Tak, Hye-jin Shim, Joon Son Chung, Bong-Jin Lee, Ha-Jin Yu, Nicholas W. D. Evans, 
AASIST: Audio Anti-Spoofing Using Integrated Spectro-Temporal Graph Attention Networks.
ICASSP2022
Hemlata Tak, Madhu R. Kamble, Jose Patino 0001, Massimiliano Todisco, Nicholas W. D. Evans, 
Rawboost: A Raw Data Boosting and Augmentation Method Applied to Automatic Speaker Verification Anti-Spoofing.
Interspeech2022
Jee-weon Jung, Hemlata Tak, Hye-jin Shim, Hee-Soo Heo, Bong-Jin Lee, Soo-Whan Chung, Ha-Jin Yu, Nicholas W. D. Evans, Tomi Kinnunen, 
SASV 2022: The First Spoofing-Aware Speaker Verification Challenge.
ICASSP2021
Hemlata Tak, Jose Patino 0001, Massimiliano Todisco, Andreas Nautsch, Nicholas W. D. Evans, Anthony Larcher, 
End-to-End anti-spoofing with RawNet2.
Interspeech2021
Jose Patino 0001, Natalia A. Tomashenko, Massimiliano Todisco, Andreas Nautsch, Nicholas W. D. Evans, 
Speaker Anonymisation Using the McAdams Coefficient.
Interspeech2021
Oubaïda Chouchane, Baptiste Brossier, Jorge Esteban Gamboa Gamboa, Thomas Lardy, Hemlata Tak, Orhan Ermis, Madhu R. Kamble, Jose Patino 0001, Nicholas W. D. Evans, Melek Önen, Massimiliano Todisco, 
Privacy-Preserving Voice Anti-Spoofing Using Secure Multi-Party Computation.
Interspeech2021
Wanying Ge, Michele Panariello, Jose Patino 0001, Massimiliano Todisco, Nicholas W. D. Evans, 
Partially-Connected Differentiable Architecture Search for Deepfake and Spoofing Detection.
Interspeech2021
Madhu R. Kamble, José Andrés González López, Teresa Grau, Juan M. Espín, Lorenzo Cascioli, Yiqing Huang, Alejandro Gomez-Alanis, Jose Patino 0001, Roberto Font, Antonio M. Peinado, Angel M. Gomez, Nicholas W. D. Evans, Maria A. Zuluaga, Massimiliano Todisco, 
PANACEA Cough Sound-Based Diagnosis of COVID-19 for the DiCOVA 2021 Challenge.
Interspeech2021
Tomi Kinnunen, Andreas Nautsch, Md. Sahidullah, Nicholas W. D. Evans, Xin Wang 0037, Massimiliano Todisco, Héctor Delgado, Junichi Yamagishi, Kong Aik Lee, 
Visualizing Classifier Adjacency Relations: A Case Study in Speaker Verification and Voice Anti-Spoofing.
Interspeech2021
Hemlata Tak, Jee-weon Jung, Jose Patino 0001, Massimiliano Todisco, Nicholas W. D. Evans, 
Graph Attention Networks for Anti-Spoofing.
Interspeech2021
Lin Zhang, Xin Wang 0037, Erica Cooper, Junichi Yamagishi, Jose Patino 0001, Nicholas W. D. Evans, 
An Initial Investigation for Detecting Partially Spoofed Audio.
TASLP2020
Tomi Kinnunen, Héctor Delgado, Nicholas W. D. Evans, Kong Aik Lee, Ville Vestman, Andreas Nautsch, Massimiliano Todisco, Xin Wang 0037, Md. Sahidullah, Junichi Yamagishi, Douglas A. Reynolds, 
Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification: Fundamentals.
ICASSP2020
Pramod B. Bachhav, Massimiliano Todisco, Nicholas W. D. Evans, 
Artificial Bandwidth Extension Using Conditional Variational Auto-encoders and Adversarial Learning.
Interspeech2020
Andreas Nautsch, Jose Patino 0001, Natalia A. Tomashenko, Junichi Yamagishi, Paul-Gauthier Noé, Jean-François Bonastre, Massimiliano Todisco, Nicholas W. D. Evans, 
The Privacy ZEBRA: Zero Evidence Biometric Recognition Assessment.
Interspeech2020
Paul-Gauthier Noé, Jean-François Bonastre, Driss Matrouf, Natalia A. Tomashenko, Andreas Nautsch, Nicholas W. D. Evans, 
Speech Pseudonymisation Assessment Using Voice Similarity Matrices.
Interspeech2020
Hemlata Tak, Jose Patino 0001, Andreas Nautsch, Nicholas W. D. Evans, Massimiliano Todisco, 
Spoofing Attack Detection Using the Non-Linear Fusion of Sub-Band Classifiers.
Interspeech2020
Natalia A. Tomashenko, Brij Mohan Lal Srivastava, Xin Wang 0037, Emmanuel Vincent 0001, Andreas Nautsch, Junichi Yamagishi, Nicholas W. D. Evans, Jose Patino 0001, Jean-François Bonastre, Paul-Gauthier Noé, Massimiliano Todisco, 
Introducing the VoicePrivacy Initiative.
ICASSP2019
Pramod B. Bachhav, Massimiliano Todisco, Nicholas W. D. Evans, 
Latent Representation Learning for Artificial Bandwidth Extension Using a Conditional Variational Auto-encoder.
Interspeech2019
Andreas Nautsch, Jose Patino 0001, Amos Treiber, Themos Stafylakis, Petr Mizera, Massimiliano Todisco, Thomas Schneider 0003, Nicholas W. D. Evans, 
Privacy-Preserving Speaker Recognition with Cohort Score Normalisation.
Interspeech2019
Andreas Nautsch, Catherine Jasserand, Els Kindt, Massimiliano Todisco, Isabel Trancoso, Nicholas W. D. Evans, 
The GDPR & Speech Data: Reflections of Legal and Technology Communities, First Steps Towards a Common Understanding.
ICASSP2022
Yurii Iotov, Sidsel Marie Nørholm, Valiantsin Belyi, Mads Dyrholm, Mads Græsbøll Christensen, 
Computationally Efficient Fixed-Filter ANC for Speech Based on Long-Term Prediction for Headphone Applications.
ICASSP2022
Yang Xiang, Jesper Lisby Højvang, Morten Højfeldt Rasmussen, Mads Græsbøll Christensen, 
A Bayesian Permutation Training Deep Representation Learning Method for Speech Enhancement with Variational Autoencoder.
SpeechComm2021
Amir Hossein Poorjam, Mathew Shaji Kavalekalam, Liming Shi, Yordan P. Raykov, Jesper Rindom Jensen, Max A. Little, Mads Græsbøll Christensen, 
Automatic quality control and enhancement for voice-based remote Parkinson's disease detection.
TASLP2021
Liming Shi, Taewoong Lee, Lijun Zhang 0004, Jesper Kjær Nielsen, Mads Græsbøll Christensen, 
Generation of Personal Sound Zones With Physical Meaningful Constraints and Conjugate Gradient Method.
ICASSP2021
Yang Xiang, Liming Shi, Jesper Lisby Højvang, Morten Højfeldt Rasmussen, Mads Græsbøll Christensen, 
A Novel NMF-HMM Speech Enhancement Algorithm Based on Poisson Mixture Model.
Interspeech2021
Alfredo Esquivel Jaramillo, Jesper Kjær Nielsen, Mads Græsbøll Christensen, 
Speech Decomposition Based on a Hybrid Speech Model and Optimal Segmentation.
SpeechComm2020
Jesper Rindom Jensen, Sam Karimian-Azari, Mads Græsbøll Christensen, Jacob Benesty, 
Harmonic beamformers for speech enhancement and dereverberation in the time domain.
ICASSP2020
Zihao Cui, Changchun Bao, Jesper Kjær Nielsen, Mads Græsbøll Christensen, 
Autoregressive Parameter Estimation with Dnn-Based Pre-Processing.
ICASSP2020
Liming Shi, Taewoong Lee, Lijun Zhang, Jesper Kjær Nielsen, Mads Græsbøll Christensen, 
A Fast Reduced-Rank Sound Zone Control Algorithm Using The Conjugate Gradient Method.
Interspeech2020
Yang Xiang, Liming Shi, Jesper Lisby Højvang, Morten Højfeldt Rasmussen, Mads Græsbøll Christensen, 
An NMF-HMM Speech Enhancement Method Based on Kullback-Leibler Divergence.
TASLP2019
Mathew Shaji Kavalekalam, Jesper Kjær Nielsen, Jesper Bünsow Boldt, Mads Græsbøll Christensen, 
Model-Based Speech Enhancement for Intelligibility Improvement in Binaural Hearing Aids.
TASLP2019
Liming Shi, Jesper Kjær Nielsen, Jesper Rindom Jensen, Max A. Little, Mads Græsbøll Christensen, 
Robust Bayesian Pitch Tracking Based on the Harmonic Model.
ICASSP2019
Alfredo Esquivel Jaramillo, Jesper Kjær Nielsen, Mads Græsbøll Christensen, 
A Study on How Pre-whitening Influences Fundamental Frequency Estimation.
ICASSP2019
Amir Hossein Poorjam, Yordan P. Raykov, Reham Badawy, Jesper Rindom Jensen, Mads Græsbøll Christensen, Max A. Little, 
Quality Control of Voice Recordings in Remote Parkinson's Disease Monitoring Using the Infinite Hidden Markov Model.
Interspeech2019
Charlotte Sørensen, Jesper Bünsow Boldt, Mads Græsbøll Christensen, 
Harmonic Beamformers for Non-Intrusive Speech Intelligibility Prediction.
Interspeech2019
Charlotte Sørensen, Jesper Bünsow Boldt, Mads Græsbøll Christensen, 
Validation of the Non-Intrusive Codebook-Based Short Time Objective Intelligibility Metric for Processed Speech.
SpeechComm2018
Charlotte Sorensen, Mathew Shaji Kavalekalam, Angeliki Xenaki, Jesper Bünsow Boldt, Mads Græsbøll Christensen, 
Non-intrusive codebook-based intelligibility prediction.
ICASSP2018
Mathew Shaji Kavalekalam, Jesper Kjær Nielsen, Mads Græsbøll Christensen, Jesper Bünsow Boldt, 
A Study of Noise PSD Estimators for Single Channel Speech Enhancement.
ICASSP2018
Taewoong Lee, Jesper Kjær Nielsen, Jesper Rindom Jensen, Mads Græsbøll Christensen, 
A Unified Approach to Generating Sound Zones Using Variable Span Linear Filters.
ICASSP2018
Jesper Kjær Nielsen, Mathew Shaji Kavalekalam, Mads Græsbøll Christensen, Jesper Bünsow Boldt, 
Model-Based Noise PSD Estimation from Speech in Non-Stationary Noise.
ICASSP2022
Tara N. Sainath, Yanzhang He, Arun Narayanan, Rami Botros, Weiran Wang, David Qiu, Chung-Cheng Chiu, Rohit Prabhavalkar, Alexander Gruenstein, Anmol Gulati, Bo Li 0028, David Rybach, Emmanuel Guzman, Ian McGraw, James Qin, Krzysztof Choromanski, Qiao Liang, Robert David, Ruoming Pang, Shuo-Yiin Chang, Trevor Strohman, W. Ronny Huang, Wei Han 0002, Yonghui Wu, Yu Zhang 0033, 
Improving The Latency And Quality Of Cascaded Encoders.
Interspeech2022
Lev Finkelstein, Heiga Zen, Norman Casagrande, Chun-an Chan, Ye Jia, Tom Kenter, Alexey Petelin, Jonathan Shen, Vincent Wan, Yu Zhang 0033, Yonghui Wu, Rob Clark, 
Training Text-To-Speech Systems From Synthetic Data: A Practical Approach For Accent Transfer Tasks.
ICML2022
Chung-Cheng Chiu, James Qin, Yu Zhang, Jiahui Yu, Yonghui Wu, 
Self-supervised learning with random-projection quantizer for speech recognition.
ICASSP2021
Isaac Elias, Heiga Zen, Jonathan Shen, Yu Zhang 0033, Ye Jia, Ron J. Weiss, Yonghui Wu, 
Parallel Tacotron: Non-Autoregressive and Controllable TTS.
ICASSP2021
Bo Li 0028, Anmol Gulati, Jiahui Yu, Tara N. Sainath, Chung-Cheng Chiu, Arun Narayanan, Shuo-Yiin Chang, Ruoming Pang, Yanzhang He, James Qin, Wei Han 0002, Qiao Liang, Yu Zhang 0033, Trevor Strohman, Yonghui Wu, 
A Better and Faster end-to-end Model for Streaming ASR.
ICASSP2021
Jiahui Yu, Chung-Cheng Chiu, Bo Li 0028, Shuo-Yiin Chang, Tara N. Sainath, Yanzhang He, Arun Narayanan, Wei Han 0002, Anmol Gulati, Yonghui Wu, Ruoming Pang, 
FastEmit: Low-Latency Streaming ASR with Sequence-Level Emission Regularization.
Interspeech2021
Isaac Elias, Heiga Zen, Jonathan Shen, Yu Zhang 0033, Ye Jia, R. J. Skerry-Ryan, Yonghui Wu, 
Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling.
Interspeech2021
Ye Jia, Heiga Zen, Jonathan Shen, Yu Zhang 0033, Yonghui Wu, 
PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS.
ICLR2021
Jiahui Yu, Wei Han 0002, Anmol Gulati, Chung-Cheng Chiu, Bo Li 0028, Tara N. Sainath, Yonghui Wu, Ruoming Pang, 
Dual-mode ASR: Unify and Improve Streaming ASR with Full-context Modeling.
ICASSP2020
Bo Li 0028, Shuo-Yiin Chang, Tara N. Sainath, Ruoming Pang, Yanzhang He, Trevor Strohman, Yonghui Wu, 
Towards Fast and Accurate Streaming End-To-End ASR.
ICASSP2020
Daniel S. Park, Yu Zhang 0033, Chung-Cheng Chiu, Youzheng Chen, Bo Li 0028, William Chan, Quoc V. Le, Yonghui Wu, 
Specaugment on Large Scale Datasets.
ICASSP2020
Tara N. Sainath, Yanzhang He, Bo Li 0028, Arun Narayanan, Ruoming Pang, Antoine Bruguier, Shuo-Yiin Chang, Wei Li 0133, Raziel Alvarez, Zhifeng Chen, Chung-Cheng Chiu, David Garcia, Alexander Gruenstein, Ke Hu, Anjuli Kannan, Qiao Liang, Ian McGraw, Cal Peyser, Rohit Prabhavalkar, Golan Pundak, David Rybach, Yuan Shangguan, Yash Sheth, Trevor Strohman, Mirkó Visontai, Yonghui Wu, Yu Zhang 0033, Ding Zhao, 
A Streaming On-Device End-To-End Model Surpassing Server-Side Conventional Model Quality and Latency.
ICASSP2020
Guangzhi Sun, Yu Zhang 0033, Ron J. Weiss, Yuan Cao 0007, Heiga Zen, Yonghui Wu, 
Fully-Hierarchical Fine-Grained Prosody Modeling For Interpretable Speech Synthesis.
ICASSP2020
Gary Wang, Andrew Rosenberg, Zhehuai Chen, Yu Zhang 0033, Bhuvana Ramabhadran, Yonghui Wu, Pedro J. Moreno 0001, 
Improving Speech Recognition Using Consistent Predictions on Synthesized Speech.
Interspeech2020
Anmol Gulati, James Qin, Chung-Cheng Chiu, Niki Parmar, Yu Zhang 0033, Jiahui Yu, Wei Han 0002, Shibo Wang, Zhengdong Zhang, Yonghui Wu, Ruoming Pang, 
Conformer: Convolution-augmented Transformer for Speech Recognition.
Interspeech2020
Wei Han 0002, Zhengdong Zhang, Yu Zhang 0033, Jiahui Yu, Chung-Cheng Chiu, James Qin, Anmol Gulati, Ruoming Pang, Yonghui Wu, 
ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context.
Interspeech2020
Daniel S. Park, Yu Zhang 0033, Ye Jia, Wei Han 0002, Chung-Cheng Chiu, Bo Li 0028, Yonghui Wu, Quoc V. Le, 
Improved Noisy Student Training for Automatic Speech Recognition.
ICASSP2019
Yanzhang He, Tara N. Sainath, Rohit Prabhavalkar, Ian McGraw, Raziel Alvarez, Ding Zhao, David Rybach, Anjuli Kannan, Yonghui Wu, Ruoming Pang, Qiao Liang, Deepti Bhatia, Yuan Shangguan, Bo Li 0028, Golan Pundak, Khe Chai Sim, Tom Bagby, Shuo-Yiin Chang, Kanishka Rao, Alexander Gruenstein, 
Streaming End-to-end Speech Recognition for Mobile Devices.
ICASSP2019
Wei-Ning Hsu, Yu Zhang 0033, Ron J. Weiss, Yu-An Chung, Yuxuan Wang 0002, Yonghui Wu, James R. Glass, 
Disentangling Correlated Speaker and Noise for Speech Synthesis via Data Augmentation and Adversarial Factorization.
ICASSP2019
Bo Li 0028, Yu Zhang 0033, Tara N. Sainath, Yonghui Wu, William Chan, 
Bytes Are All You Need: End-to-end Multilingual Speech Recognition and Synthesis with Bytes.
SpeechComm2021
Tharshini Gunendradasan, Eliathamby Ambikairajah, Julien Epps, Vidhyasaharan Sethu, Haizhou Li 0001, 
An adaptive transmission line cochlear model based front-end for replay attack detection.
Interspeech2021
Beena Ahmed, Kirrie J. Ballard, Denis Burnham, Tharmakulasingam Sirojan, Hadi Mehmood, Dominique Estival, Elise Baker, Felicity Cox, Joanne Arciuli, Titia Benders, Katherine Demuth, Barbara Kelly, Chloé Diskin-Holdaway, Mostafa Ali Shahin, Vidhyasaharan Sethu, Julien Epps, Chwee Beng Lee, Eliathamby Ambikairajah, 
AusKidTalk: An Auditory-Visual Corpus of 3- to 12-Year-Old Australian Children's Speech.
Interspeech2021
Deboshree Bose, Vidhyasaharan Sethu, Eliathamby Ambikairajah, 
Parametric Distributions to Model Numerical Emotion Labels.
ICASSP2020
Eliathamby Ambikairajah, Vidhyasaharan Sethu, 
Cochlear Signal Processing: A Platform for Learning the Fundamentals of Digital Signal Processing.
ICASSP2020
Gajan Suthokumar, Vidhyasaharan Sethu, Kaavya Sriskandaraja, Eliathamby Ambikairajah, 
Adversarial Multi-Task Learning for Speaker Normalization in Replay Detection.
ICASSP2019
Tharshini Gunendradasan, Saad Irtza, Eliathamby Ambikairajah, Julien Epps, 
Transmission Line Cochlear Model Based AM-FM Features for Replay Attack Detection.
ICASSP2019
Gajan Suthokumar, Kaavya Sriskandaraja, Vidhyasaharan Sethu, Chamith Wijenayake, Eliathamby Ambikairajah, 
Phoneme Specific Modelling and Scoring Techniques for Anti Spoofing System.
ICASSP2019
Buddhi Wickramasinghe, Eliathamby Ambikairajah, Julien Epps, Vidhyasaharan Sethu, Haizhou Li 0001, 
Auditory Inspired Spatial Differentiation for Replay Spoofing Attack Detection.
Interspeech2019
Tharshini Gunendradasan, Eliathamby Ambikairajah, Julien Epps, Haizhou Li 0001, 
An Adaptive-Q Cochlear Model for Replay Spoofing Detection.
Interspeech2019
Anda Ouyang, Ting Dang, Vidhyasaharan Sethu, Eliathamby Ambikairajah, 
Speech Based Emotion Prediction: Can a Linear Model Work?
Interspeech2019
Buddhi Wickramasinghe, Eliathamby Ambikairajah, Julien Epps, 
Biologically Inspired Adaptive-Q Filterbanks for Replay Spoofing Attack Detection.
SpeechComm2018
Saad Irtza, Vidhyasaharan Sethu, Eliathamby Ambikairajah, Haizhou Li 0001, 
Using language cluster models in hierarchical language identification.
ICASSP2018
Sarith Fernando, Vidhyasaharan Sethu, Eliathamby Ambikairajah, 
Factorized Hidden Variability Learning for Adaptation of Short Duration Language Identification Models.
ICASSP2018
Jianbo Ma, Vidhyasaharan Sethu, Eliathamby Ambikairajah, Kong-Aik Lee, 
Speaker-Phonetic Vector Estimation for Short Duration Speaker Verification.
Interspeech2018
Sarith Fernando, Vidhyasaharan Sethu, Eliathamby Ambikairajah, 
Sub-band Envelope Features Using Frequency Domain Linear Prediction for Short Duration Language Identification.
Interspeech2018
Tharshini Gunendradasan, Buddhi Wickramasinghe, Phu Ngoc Le, Eliathamby Ambikairajah, Julien Epps, 
Detection of Replay-Spoofing Attacks Using Frequency Modulation Features.
Interspeech2018
Kaavya Sriskandaraja, Vidhyasaharan Sethu, Eliathamby Ambikairajah, 
Deep Siamese Architecture Based Replay Detection for Secure Voice Biometric.
Interspeech2018
Gajan Suthokumar, Vidhyasaharan Sethu, Chamith Wijenayake, Eliathamby Ambikairajah, 
Modulation Dynamic Features for the Detection of Replay Attacks.
Interspeech2018
Buddhi Wickramasinghe, Saad Irtza, Eliathamby Ambikairajah, Julien Epps, 
Frequency Domain Linear Prediction Features for Replay Spoofing Attack Detection.
Interspeech2017
Siyuan Chen, Julien Epps, Eliathamby Ambikairajah, Phu Ngoc Le, 
An Investigation of Crowd Speech for Room Occupancy Estimation.
SpeechComm2023
Daniel R. van Niekerk, Anqi Xu, Branislav Gerazov, Paul Konstantin Krug, Peter Birkholz, Lorna F. Halliday, Santitham Prom-on, Yi Xu 0007, 
Simulating vocal learning of spoken language: Beyond imitation.
TASLP2022
Simon Stone, Yingming Gao, Peter Birkholz, 
Articulatory Synthesis of Vocalized /r/ Allophones in German.
ICASSP2022
Peter Birkholz, P. Häsner, Steffen Kürbis, 
Acoustic Comparison of Physical Vocal Tract Models with Hard and Soft Walls.
ICASSP2022
Hannes Kath, Simon Stone, Stefan Rapp, Peter Birkholz, 
Carina - A Corpus of Aligned German Read Speech Including Annotations.
Interspeech2022
Pouriya Amini Digehsara, João Vítor Possamai de Menezes, Christoph Wagner, Michael Bärhold, Petr Schaffer, Dirk Plettemeier, Peter Birkholz, 
A user-friendly headset for radar-based silent speech recognition.
Interspeech2022
Arne-Lukas Fietkau, Simon Stone, Peter Birkholz, 
Relationship between the acoustic time intervals and tongue movements of German diphthongs.
Interspeech2022
Paul Konstantin Krug, Peter Birkholz, Branislav Gerazov, Daniel Rudolph van Niekerk, Anqi Xu, Yi Xu, 
Articulatory Synthesis for Data Augmentation in Phoneme Recognition.
Interspeech2022
Ingo Langheinrich, Simon Stone, Xinyu Zhang, Peter Birkholz, 
Glottal inverse filtering based on articulatory synthesis and deep learning.
Interspeech2022
Leon Liebig, Christoph Wagner, Alexander Mainka, Peter Birkholz, 
An investigation of regression-based prediction of the femininity or masculinity in speech of transgender people.
Interspeech2022
João Vítor Menezes, Pouriya Amini Digehsara, Christoph Wagner, Marco Mütze, Michael Bärhold, Petr Schaffer, Dirk Plettemeier, Peter Birkholz, 
Evaluation of different antenna types and positions in a stepped frequency continuous-wave radar-based silent speech interface.
Interspeech2022
Debasish Ray Mohapatra, Mario Fleischer, Victor Zappi, Peter Birkholz, Sidney S. Fels, 
Three-dimensional finite-difference time-domain acoustic analysis of simplified vocal tract shapes.
Interspeech2022
Daniel R. van Niekerk, Anqi Xu, Branislav Gerazov, Paul Konstantin Krug, Peter Birkholz, Yi Xu, 
Exploration strategies for articulatory synthesis of complex syllable onsets.
Interspeech2022
Yi Xu, Anqi Xu, Daniel R. van Niekerk, Branislav Gerazov, Peter Birkholz, Paul Konstantin Krug, Santitham Prom-on, Lorna F. Halliday, 
Evoc-Learn - High quality simulation of early vocal learning.
SpeechComm2021
Peter Birkholz, Susanne Drechsel, 
Effects of the piriform fossae, transvelar acoustic coupling, and laryngeal wall vibration on the naturalness of articulatory speech synthesis.
Interspeech2021
Rémi Blandin, Marc Arnela, Simon Félix, Jean-Baptiste Doc, Peter Birkholz, 
Comparison of the Finite Element Method, the Multimodal Method and the Transmission-Line Model for the Computation of Vocal Tract Transfer Functions.
Interspeech2021
Alexander Wilbrandt, Simon Stone, Peter Birkholz, 
Articulatory Data Recorder: A Framework for Real-Time Articulatory Data Recording.
Interspeech2021
Anqi Xu, Daniel R. van Niekerk, Branislav Gerazov, Paul Konstantin Krug, Santitham Prom-on, Peter Birkholz, Yi Xu, 
Model-Based Exploration of Linking Between Vowel Articulatory Space and Acoustic Space.
SpeechComm2020
Thuan Van Ngo, Masato Akagi, Peter Birkholz, 
Effect of articulatory and acoustic features on the intelligibility of speech in noise: An articulatory synthesis study.
ICASSP2020
Peter Birkholz, Xinyu Zhang, 
Accounting for Microprosody in Modeling Intonation.
ICASSP2020
Simon Stone, Peter Birkholz, 
Cross-Speaker Silent-Speech Command Word Recognition Using Electro-Optical Stomatography.
ICASSP2022
Harishchandra Dubey, Vishak Gopal, Ross Cutler, Ashkan Aazami, Sergiy Matusevych, Sebastian Braun, Sefik Emre Eskimez, Manthan Thakker, Takuya Yoshioka, Hannes Gamper, Robert Aichner, 
Icassp 2022 Deep Noise Suppression Challenge.
ICASSP2022
Sefik Emre Eskimez, Takuya Yoshioka, Huaming Wang, Xiaofei Wang 0009, Zhuo Chen 0006, Xuedong Huang 0001, 
Personalized speech enhancement: new models and Comprehensive evaluation.
ICASSP2022
Naoyuki Kanda, Xiong Xiao, Yashesh Gaur, Xiaofei Wang 0009, Zhong Meng, Zhuo Chen 0006, Takuya Yoshioka, 
Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers Using End-to-End Speaker-Attributed ASR.
ICASSP2022
Heming Wang, Yao Qian, Xiaofei Wang 0009, Yiming Wang, Chengyi Wang 0002, Shujie Liu 0001, Takuya Yoshioka, Jinyu Li 0001, DeLiang Wang, 
Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction.
ICASSP2022
Yixuan Zhang, Zhuo Chen 0006, Jian Wu 0027, Takuya Yoshioka, Peidong Wang, Zhong Meng, Jinyu Li 0001, 
Continuous Speech Separation with Recurrent Selective Attention Network.
Interspeech2022
Naoyuki Kanda, Jian Wu 0027, Yu Wu 0012, Xiong Xiao, Zhong Meng, Xiaofei Wang 0009, Yashesh Gaur, Zhuo Chen 0006, Jinyu Li 0001, Takuya Yoshioka, 
Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings.
Interspeech2022
Naoyuki Kanda, Jian Wu 0027, Yu Wu 0012, Xiong Xiao, Zhong Meng, Xiaofei Wang 0009, Yashesh Gaur, Zhuo Chen 0006, Jinyu Li 0001, Takuya Yoshioka, 
Streaming Multi-Talker ASR with Token-Level Serialized Output Training.
Interspeech2022
Manthan Thakker, Sefik Emre Eskimez, Takuya Yoshioka, Huaming Wang, 
Fast Real-time Personalized Speech Enhancement: End-to-End Enhancement Network (E3Net) and Knowledge Distillation.
Interspeech2022
Xiaofei Wang 0009, Dongmei Wang, Naoyuki Kanda, Sefik Emre Eskimez, Takuya Yoshioka, 
Leveraging Real Conversational Data for Multi-Channel Continuous Speech Separation.
Interspeech2022
Wangyou Zhang, Zhuo Chen 0006, Naoyuki Kanda, Shujie Liu 0001, Jinyu Li 0001, Sefik Emre Eskimez, Takuya Yoshioka, Xiong Xiao, Zhong Meng, Yanmin Qian, Furu Wei, 
Separating Long-Form Speech with Group-wise Permutation Invariant Training.
ICASSP2021
Sanyuan Chen, Yu Wu 0012, Zhuo Chen 0006, Takuya Yoshioka, Shujie Liu 0001, Jin-Yu Li 0001, Xiangzhan Yu, 
Don't Shoot Butterfly with Rifles: Multi-Channel Continuous Speech Separation with Early Exit Transformer.
ICASSP2021
Sanyuan Chen, Yu Wu 0012, Zhuo Chen 0006, Jian Wu 0027, Jinyu Li 0001, Takuya Yoshioka, Chengyi Wang 0002, Shujie Liu 0001, Ming Zhou 0001, 
Continuous Speech Separation with Conformer.
ICASSP2021
Naoyuki Kanda, Zhong Meng, Liang Lu 0001, Yashesh Gaur, Xiaofei Wang 0009, Zhuo Chen 0006, Takuya Yoshioka, 
Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR.
ICASSP2021
Xiong Xiao, Naoyuki Kanda, Zhuo Chen 0006, Tianyan Zhou, Takuya Yoshioka, Sanyuan Chen, Yong Zhao 0008, Gang Liu, Yu Wu 0012, Jian Wu 0027, Shujie Liu 0001, Jinyu Li 0001, Yifan Gong 0001, 
Microsoft Speaker Diarization System for the Voxceleb Speaker Recognition Challenge 2020.
Interspeech2021
Sanyuan Chen, Yu Wu 0012, Zhuo Chen 0006, Jian Wu 0027, Takuya Yoshioka, Shujie Liu 0001, Jinyu Li 0001, Xiangzhan Yu, 
Ultra Fast Speech Separation Model with Teacher Student Learning.
Interspeech2021
Sefik Emre Eskimez, Xiaofei Wang 0009, Min Tang, Hemin Yang, Zirun Zhu, Zhuo Chen 0006, Huaming Wang, Takuya Yoshioka, 
Human Listening and Live Captioning: Multi-Task Training for Speech Enhancement.
Interspeech2021
Naoyuki Kanda, Guoli Ye, Yashesh Gaur, Xiaofei Wang 0009, Zhong Meng, Zhuo Chen 0006, Takuya Yoshioka, 
End-to-End Speaker-Attributed ASR with Transformer.
Interspeech2021
Naoyuki Kanda, Guoli Ye, Yu Wu 0012, Yashesh Gaur, Xiaofei Wang 0009, Zhong Meng, Zhuo Chen 0006, Takuya Yoshioka, 
Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone.
Interspeech2021
Jian Wu 0027, Zhuo Chen 0006, Sanyuan Chen, Yu Wu 0012, Takuya Yoshioka, Naoyuki Kanda, Shujie Liu 0001, Jinyu Li 0001, 
Investigation of Practical Aspects of Single Channel Speech Separation for ASR.
ICASSP2020
Zhuo Chen 0006, Takuya Yoshioka, Liang Lu 0001, Tianyan Zhou, Zhong Meng, Yi Luo 0004, Jian Wu 0027, Xiong Xiao, Jinyu Li 0001, 
Continuous Speech Separation: Dataset and Analysis.
ICASSP2022
Naoyuki Kanda, Xiong Xiao, Yashesh Gaur, Xiaofei Wang 0009, Zhong Meng, Zhuo Chen 0006, Takuya Yoshioka, 
Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers Using End-to-End Speaker-Attributed ASR.
ICASSP2022
Yixuan Zhang, Zhuo Chen 0006, Jian Wu 0027, Takuya Yoshioka, Peidong Wang, Zhong Meng, Jinyu Li 0001, 
Continuous Speech Separation with Recurrent Selective Attention Network.
Interspeech2022
Naoyuki Kanda, Jian Wu 0027, Yu Wu 0012, Xiong Xiao, Zhong Meng, Xiaofei Wang 0009, Yashesh Gaur, Zhuo Chen 0006, Jinyu Li 0001, Takuya Yoshioka, 
Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings.
Interspeech2022
Naoyuki Kanda, Jian Wu 0027, Yu Wu 0012, Xiong Xiao, Zhong Meng, Xiaofei Wang 0009, Yashesh Gaur, Zhuo Chen 0006, Jinyu Li 0001, Takuya Yoshioka, 
Streaming Multi-Talker ASR with Token-Level Serialized Output Training.
Interspeech2022
Zhong Meng, Yashesh Gaur, Naoyuki Kanda, Jinyu Li 0001, Xie Chen 0001, Yu Wu 0012, Yifan Gong 0001, 
Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition.
Interspeech2022
Wangyou Zhang, Zhuo Chen 0006, Naoyuki Kanda, Shujie Liu 0001, Jinyu Li 0001, Sefik Emre Eskimez, Takuya Yoshioka, Xiong Xiao, Zhong Meng, Yanmin Qian, Furu Wei, 
Separating Long-Form Speech with Group-wise Permutation Invariant Training.
ICASSP2021
Naoyuki Kanda, Zhong Meng, Liang Lu 0001, Yashesh Gaur, Xiaofei Wang 0009, Zhuo Chen 0006, Takuya Yoshioka, 
Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR.
ICASSP2021
Zhong Meng, Naoyuki Kanda, Yashesh Gaur, Sarangarajan Parthasarathy, Eric Sun, Liang Lu 0001, Xie Chen 0001, Jinyu Li 0001, Yifan Gong 0001, 
Internal Language Model Training for Domain-Adaptive End-To-End Speech Recognition.
ICASSP2021
Eric Sun, Liang Lu 0001, Zhong Meng, Yifan Gong 0001, 
Sequence-Level Self-Teaching Regularization.
Interspeech2021
Liang Lu 0001, Zhong Meng, Naoyuki Kanda, Jinyu Li 0001, Yifan Gong 0001, 
On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer.
Interspeech2021
Yan Deng, Rui Zhao 0017, Zhong Meng, Xie Chen 0001, Bing Liu, Jinyu Li 0001, Yifan Gong 0001, Lei He 0005, 
Improving RNN-T for Domain Scaling Using Semi-Supervised Training with Neural TTS.
Interspeech2021
Naoyuki Kanda, Guoli Ye, Yashesh Gaur, Xiaofei Wang 0009, Zhong Meng, Zhuo Chen 0006, Takuya Yoshioka, 
End-to-End Speaker-Attributed ASR with Transformer.
Interspeech2021
Naoyuki Kanda, Guoli Ye, Yu Wu 0012, Yashesh Gaur, Xiaofei Wang 0009, Zhong Meng, Zhuo Chen 0006, Takuya Yoshioka, 
Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone.
Interspeech2021
Zhong Meng, Yu Wu 0012, Naoyuki Kanda, Liang Lu 0001, Xie Chen 0001, Guoli Ye, Eric Sun, Jinyu Li 0001, Yifan Gong 0001, 
Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition.
Interspeech2021
Eric Sun, Jinyu Li 0001, Zhong Meng, Yu Wu 0012, Jian Xue, Shujie Liu 0001, Yifan Gong 0001, 
Improving Multilingual Transformer Transducer Models by Reducing Language Confusions.
ICASSP2020
Zhuo Chen 0006, Takuya Yoshioka, Liang Lu 0001, Tianyan Zhou, Zhong Meng, Yi Luo 0004, Jian Wu 0027, Xiong Xiao, Jinyu Li 0001, 
Continuous Speech Separation: Dataset and Analysis.
ICASSP2020
Jinyu Li 0001, Rui Zhao 0017, Eric Sun, Jeremy H. M. Wong, Amit Das, Zhong Meng, Yifan Gong 0001, 
High-Accuracy and Low-Latency Speech Recognition with Two-Head Contextual Layer Trajectory LSTM Model.
ICASSP2020
Zhong Meng, Hu Hu, Jinyu Li 0001, Changliang Liu, Yan Huang 0028, Yifan Gong 0001, Chin-Hui Lee, 
L-Vector: Neural Label Embedding for Domain Adaptation.
Interspeech2020
Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang 0009, Zhong Meng, Zhuo Chen 0006, Tianyan Zhou, Takuya Yoshioka, 
Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of any Number of Speakers.
Interspeech2020
Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang 0009, Zhong Meng, Takuya Yoshioka, 
Serialized Output Training for End-to-End Overlapped Speech Recognition.
SpeechComm2023
Bence Mark Halpern, Siyuan Feng 0001, Rob van Son, Michiel W. M. van den Brekel, Odette Scharenborg, 
Automatic evaluation of spontaneous oral cancer speech using ratings from naive listeners.
SpeechComm2022
Bence Mark Halpern, Siyuan Feng 0001, Rob van Son, Michiel W. M. van den Brekel, Odette Scharenborg, 
Low-resource automatic speech recognition and error analyses of oral cancer speech.
ICASSP2022
Hang Chen, Hengshun Zhou, Jun Du, Chin-Hui Lee, Jingdong Chen, Shinji Watanabe 0001, Sabato Marco Siniscalchi, Odette Scharenborg, Diyuan Liu, Bao-Cai Yin, Jia Pan, Jianqing Gao, Cong Liu 0006, 
The First Multimodal Information Based Speech Processing (Misp) Challenge: Data, Tasks, Baselines And Results.
Interspeech2022
Hang Chen, Jun Du, Yusheng Dai, Chin-Hui Lee, Sabato Marco Siniscalchi, Shinji Watanabe 0001, Odette Scharenborg, Jingdong Chen, Baocai Yin, Jia Pan, 
Audio-Visual Speech Recognition in MISP2021 Challenge: Dataset Release and Deep Analysis.
Interspeech2022
Tanvina Patel, Odette Scharenborg, 
Using cross-model learnings for the Gram Vaani ASR Challenge 2022.
Interspeech2022
Luke Prananta, Bence Mark Halpern, Siyuan Feng 0001, Odette Scharenborg, 
The Effectiveness of Time Stretching for Enhancing Dysarthric Speech for Improved Dysarthric Speech Recognition.
Interspeech2022
Yuanyuan Zhang, Yixuan Zhang, Bence Mark Halpern, Tanvina Patel, Odette Scharenborg, 
Mitigating bias against non-native accents.
Interspeech2022
Hengshun Zhou, Jun Du, Gongzhen Zou, Zhaoxu Nian, Chin-Hui Lee, Sabato Marco Siniscalchi, Shinji Watanabe 0001, Odette Scharenborg, Jingdong Chen, Shifu Xiong, Jianqing Gao, 
Audio-Visual Wake Word Spotting in MISP2021 Challenge: Dataset Release and Deep Analysis.
SpeechComm2021
Polina Drozdova, Roeland van Hout, Sven L. Mattys, Odette Scharenborg, 
The effect of intermittent noise on lexically-guided perceptual learning in native and non-native listening.
TASLP2021
Xinsheng Wang, Tingting Qiao, Jihua Zhu, Alan Hanjalic, Odette Scharenborg, 
Generating Images From Spoken Descriptions.
ICASSP2021
Siyuan Feng 0001, Piotr Zelasko, Laureano Moro-Velázquez, Ali Abavisani, Mark Hasegawa-Johnson, Odette Scharenborg, Najim Dehak, 
How Phonotactics Affect Multilingual and Zero-Shot ASR Performance.
ICASSP2021
Xinsheng Wang, Siyuan Feng 0001, Jihua Zhu, Mark Hasegawa-Johnson, Odette Scharenborg, 
Show and Speak: Directly Synthesize Spoken Description of Images.
Interspeech2021
Siyuan Feng 0001, Piotr Zelasko, Laureano Moro-Velázquez, Odette Scharenborg, 
Unsupervised Acoustic Unit Discovery by Leveraging a Language-Independent Subword Discriminative Feature Representation.
TASLP2020
Odette Scharenborg, Lucas Ondel, Shruti Palaskar, Philip Arthur, Francesco Ciannella, Mingxing Du, Elin Larsen, Danny Merkx, Rachid Riad, Liming Wang, Emmanuel Dupoux, Laurent Besacier, Alan W. Black, Mark Hasegawa-Johnson, Florian Metze, Graham Neubig, Sebastian Stüker, Pierre Godard, Markus Müller 0001, 
Speech Technology for Unwritten Languages.
Interspeech2020
Siyuan Feng 0001, Odette Scharenborg, 
Unsupervised Subword Modeling Using Autoregressive Pretraining and Cross-Lingual Phone-Aware Modeling.
Interspeech2020
Bence Mark Halpern, Rob van Son, Michiel W. M. van den Brekel, Odette Scharenborg, 
Detecting and Analysing Spontaneous Oral Cancer Speech in the Wild.
Interspeech2020
Justin van der Hout, Zoltán D'Haese, Mark Hasegawa-Johnson, Odette Scharenborg, 
Evaluating Automatically Generated Phoneme Captions for Images.
Interspeech2020
Xinsheng Wang, Tingting Qiao, Jihua Zhu, Alan Hanjalic, Odette Scharenborg, 
S2IGAN: Speech-to-Image Generation via Adversarial Learning.
Interspeech2020
Piotr Zelasko, Laureano Moro-Velázquez, Mark Hasegawa-Johnson, Odette Scharenborg, Najim Dehak, 
That Sounds Familiar: An Analysis of Phonetic Representations Transfer Across Languages.
SpeechComm2019
Odette Scharenborg, Marjolein van Os, 
Why listening in background noise is harder in a non-native language than in a native language: A review.
SpeechComm2022
Martha Yifiru Tachbelie, Solomon Teferra Abate, Tanja Schultz, 
Multilingual speech recognition for GlobalPhone languages.
ICASSP2022
Ayimnisagul Ablimit, Catarina Botelho, Alberto Abad, Tanja Schultz, Isabel Trancoso, 
Exploring Dementia Detection from Speech: Cross Corpus Analysis.
ICASSP2022
Miguel Angrick, Maarten C. Ottenhoff, Lorenz Diener, Darius Ivucic, Gabriel Ivucic, Sophocles Goulis, Albert J. Colon, G. Louis Wagner, Dean J. Krusienski, Pieter L. Kubben, Tanja Schultz, Christian Herff, 
Towards Closed-Loop Speech Synthesis from Stereotactic EEG: A Unit Selection Approach.
ICASSP2022
Marvin Borsdorf, Kevin Scheck, Haizhou Li 0001, Tanja Schultz, 
Experts Versus All-Rounders: Target Language Extraction for Multiple Target Languages.
ICASSP2022
Sreeja Manghat, Sreeram Manghat, Tanja Schultz, 
Hybrid sub-word segmentation for handling long tail in morphologically rich low resource languages.
ICASSP2022
Kun Qian 0003, Tanja Schultz, Björn W. Schuller, 
An Overview of the FIRST ICASSP Special Session on Computer Audition for Healthcare.
Interspeech2022
Ayimnisagul Ablimit, Karen Scholz, Tanja Schultz, 
Deep Learning Approaches for Detecting Alzheimer's Dementia from Conversational Speech of ILSE Study.
Interspeech2022
Marvin Borsdorf, Kevin Scheck, Haizhou Li 0001, Tanja Schultz, 
Blind Language Separation: Disentangling Multilingual Cocktail Party Voices by Language.
Interspeech2022
Catarina Botelho, Tanja Schultz, Alberto Abad, Isabel Trancoso, 
Challenges of using longitudinal and cross-domain corpora on studies of pathological speech.
Interspeech2022
Sreeram Manghat, Sreeja Manghat, Tanja Schultz, 
Normalization of code-switched text for speech synthesis.
ICASSP2021
Solomon Teferra Abate, Martha Yifiru Tachbelie, Tanja Schultz, 
End-to-End Multilingual Automatic Speech Recognition for Less-Resourced Languages: The Case of Four Ethiopian Languages.
Interspeech2021
Marvin Borsdorf, Chenglin Xu, Haizhou Li 0001, Tanja Schultz, 
Universal Speaker Extraction in the Presence and Absence of Target Speakers for Speech of One and Two Talkers.
Interspeech2021
Marvin Borsdorf, Chenglin Xu, Haizhou Li 0001, Tanja Schultz, 
GlobalPhone Mix-To-Separate Out of 2: A Multilingual 2000 Speakers Mixtures Database for Speech Separation.
Interspeech2021
Catarina Botelho, Alberto Abad, Tanja Schultz, Isabel Trancoso, 
Visual Speech for Obstructive Sleep Apnea Detection.
Interspeech2021
Lars Steinert, Felix Putze, Dennis Küster, Tanja Schultz, 
Audio-Visual Recognition of Emotional Engagement of People with Dementia.
ICASSP2020
Solomon Teferra Abate, Martha Yifiru Tachbelie, Tanja Schultz, 
Deep Neural Networks Based Automatic Speech Recognition for Four Ethiopian Languages.
ICASSP2020
Martha Yifiru Tachbelie, Ayimunishagu Abulimiti, Solomon Teferra Abate, Tanja Schultz, 
DNN-Based Speech Recognition for Globalphone Languages.
Interspeech2020
Solomon Teferra Abate, Martha Yifiru Tachbelie, Tanja Schultz, 
Multilingual Acoustic and Language Modeling for Ethio-Semitic Languages.
Interspeech2020
Ayimunishagu Abulimiti, Jochen Weiner, Tanja Schultz, 
Automatic Speech Recognition for ILSE-Interviews: Longitudinal Conversational Speech Recordings Covering Aging and Cognitive Decline.
Interspeech2020
Miguel Angrick, Christian Herff, Garett D. Johnson, Jerry J. Shih, Dean J. Krusienski, Tanja Schultz, 
Speech Spectrogram Estimation from Intracranial Brain Activity Using a Quantization Approach.
ICASSP2022
Iuliia Nigmatulina, Juan Zuluaga-Gomez, Amrutha Prasad, Seyyed Saeed Sarfjoo, Petr Motlícek, 
A Two-Step Approach to Leverage Contextual Data: Speech Recognition in Air-Traffic Communications.
ICASSP2021
Rudolf A. Braun, Srikanth R. Madikeri, Petr Motlícek, 
A Comparison of Methods for OOV-Word Recognition on a New Public Dataset.
Interspeech2021
Maël Fabien, Shantipriya Parida, Petr Motlícek, Dawei Zhu, Aravind Krishnan, Hoang H. Nguyen, 
ROXANNE Research Platform: Automate Criminal Investigations.
Interspeech2021
Weipeng He, Petr Motlícek, Jean-Marc Odobez, 
Multi-Task Neural Network for Robust Multiple Speaker Embedding Extraction.
Interspeech2021
Martin Kocour, Karel Veselý, Alexander Blatt, Juan Zuluaga-Gomez, Igor Szöke, Jan Cernocký, Dietrich Klakow, Petr Motlícek, 
Boosting of Contextual Information in ASR for Air-Traffic Call-Sign Recognition.
Interspeech2021
Srikanth R. Madikeri, Petr Motlícek, Hervé Bourlard, 
Multitask Adaptation with Lattice-Free MMI for Multi-Genre Speech Recognition of Low Resource Languages.
Interspeech2021
Oliver Ohneiser, Seyyed Saeed Sarfjoo, Hartmut Helmke, Shruthi Shetty, Petr Motlícek, Matthias Kleinert, Heiko Ehr, Sarunas Murauskas, 
Robust Command Recognition for Lithuanian Air Traffic Control Tower Utterances.
Interspeech2021
Seyyed Saeed Sarfjoo, Srikanth R. Madikeri, Petr Motlícek, 
Speech Activity Detection Based on Multilingual Speech Recognition System.
Interspeech2021
Esaú Villatoro-Tello, S. Pavankumar Dubagunta, Julian Fritsch, Gabriela Ramírez-de-la-Rosa, Petr Motlícek, Mathew Magimai-Doss, 
Late Fusion of the Available Lexicon and Raw Waveform-Based Acoustic Modeling for Depression and Dementia Recognition.
Interspeech2021
Juan Zuluaga-Gomez, Iuliia Nigmatulina, Amrutha Prasad, Petr Motlícek, Karel Veselý, Martin Kocour, Igor Szöke, 
Contextual Semi-Supervised Learning: An Approach to Leverage Air-Surveillance and Untranscribed ATC Data in ASR Systems.
ICASSP2020
Banriskhem K. Khonglah, Srikanth R. Madikeri, Subhadeep Dey, Hervé Bourlard, Petr Motlícek, Jayadev Billa, 
Incremental Semi-Supervised Learning for Multi-Genre Speech Recognition.
Interspeech2020
Srikanth R. Madikeri, Banriskhem K. Khonglah, Sibo Tong, Petr Motlícek, Hervé Bourlard, Daniel Povey, 
Lattice-Free Maximum Mutual Information Training of Multilingual Speech Recognition Systems.
Interspeech2020
Seyyed Saeed Sarfjoo, Srikanth R. Madikeri, Petr Motlícek, Sébastien Marcel, 
Supervised Domain Adaptation for Text-Independent Speaker Verification Using Limited Data.
Interspeech2020
Juan Zuluaga-Gomez, Petr Motlícek, Qingran Zhan, Karel Veselý, Rudolf A. Braun, 
Automatic Speech Recognition Benchmark for Air-Traffic Communications.
ICASSP2019
Srikanth R. Madikeri, Petr Motlícek, Subhadeep Dey, 
A Bayesian Approach to Inter-task Fusion for Speaker Recognition.
Interspeech2019
Subhadeep Dey, Petr Motlícek, Trung Bui, Franck Dernoncourt, 
Exploiting Semi-Supervised Training Through a Dropout Regularization in End-to-End Speech Recognition.
Interspeech2019
Thibault Viglino, Petr Motlícek, Milos Cernak, 
End-to-End Accented Speech Recognition.
ICASSP2018
Subhadeep Dey, Takafumi Koshinaka, Petr Motlícek, Srikanth R. Madikeri, 
DNN Based Speaker Embedding Using Content Information for Text-Dependent Speaker Verification.
Interspeech2018
Subhadeep Dey, Srikanth R. Madikeri, Petr Motlícek, 
End-to-end Text-dependent Speaker Verification Using Novel Distance Measures.
Interspeech2018
Weipeng He, Petr Motlícek, Jean-Marc Odobez, 
Joint Localization and Classification of Multiple Sound Sources Using a Multi-task Neural Network.
Interspeech2022
Alexander H. Liu, Cheng-I Lai, Wei-Ning Hsu, Michael Auli, Alexei Baevski, James R. Glass, 
Simple and Effective Unsupervised Speech Synthesis.
Interspeech2022
Sravya Popuri, Peng-Jen Chen, Changhan Wang, Juan Pino 0001, Yossi Adi, Jiatao Gu, Wei-Ning Hsu, Ann Lee 0001, 
Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation.
Interspeech2022
Bowen Shi, Wei-Ning Hsu, Abdelrahman Mohamed, 
Robust Self-Supervised Audio-Visual Speech Recognition.
Interspeech2022
Bowen Shi, Abdelrahman Mohamed, Wei-Ning Hsu, 
Learning Lip-Based Audio-Visual Speaker Embeddings with AV-HuBERT.
Interspeech2022
Apoorv Vyas, Wei-Ning Hsu, Michael Auli, Alexei Baevski, 
On-demand compute reduction with stochastic wav2vec 2.0.
ICML2022
Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli, 
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language.
NeurIPS2022
Wei-Ning Hsu, Bowen Shi, 
u-HuBERT: Unified Mixed-Modal Speech Pretraining And Zero-Shot Transfer to Unlabeled Modality.
ICLR2022
Bowen Shi, Wei-Ning Hsu, Kushal Lakhotia, Abdelrahman Mohamed, 
Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction.
ACL2022
Eugene Kharitonov, Ann Lee 0001, Adam Polyak, Yossi Adi, Jade Copet, Kushal Lakhotia, Tu Anh Nguyen, Morgane Rivière, Abdelrahman Mohamed, Emmanuel Dupoux, Wei-Ning Hsu, 
Text-Free Prosody-Aware Generative Spoken Language Modeling.
ACL2022
Ann Lee 0001, Peng-Jen Chen, Changhan Wang, Jiatao Gu, Sravya Popuri, Xutai Ma, Adam Polyak, Yossi Adi, Qing He, Yun Tang 0002, Juan Pino 0001, Wei-Ning Hsu, 
Direct Speech-to-Speech Translation With Discrete Units.
ACL2022
Yun Tang 0002, Hongyu Gong, Ning Dong, Changhan Wang, Wei-Ning Hsu, Jiatao Gu, Alexei Baevski, Xian Li, Abdelrahman Mohamed, Michael Auli, Juan Miguel Pino, 
Unified Speech-Text Pre-training for Speech Translation and Recognition.
NAACL2022
Ann Lee 0001, Hongyu Gong, Paul-Ambroise Duquenne, Holger Schwenk, Peng-Jen Chen, Changhan Wang, Sravya Popuri, Yossi Adi, Juan Miguel Pino, Jiatao Gu, Wei-Ning Hsu, 
Textless Speech-to-Speech Translation on Real Data.
TASLP2021
Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed, 
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units.
ICASSP2021
Wei-Ning Hsu, Yao-Hung Hubert Tsai, Benjamin Bolte, Ruslan Salakhutdinov, Abdelrahman Mohamed, 
Hubert: How Much Can a Bad Teacher Benefit ASR Pre-Training?
Interspeech2021
Wei-Ning Hsu, Anuroop Sriram, Alexei Baevski, Tatiana Likhomanenko, Qiantong Xu, Vineel Pratap, Jacob Kahn, Ann Lee 0001, Ronan Collobert, Gabriel Synnaeve, Michael Auli, 
Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training.
Interspeech2021
Adam Polyak, Yossi Adi, Jade Copet, Eugene Kharitonov, Kushal Lakhotia, Wei-Ning Hsu, Abdelrahman Mohamed, Emmanuel Dupoux, 
Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.
NeurIPS2021
Alexei Baevski, Wei-Ning Hsu, Alexis Conneau, Michael Auli, 
Unsupervised Speech Recognition.
ACL2021
Wei-Ning Hsu, David Harwath, Tyler Miller, Christopher Song, James R. Glass, 
Text-Free Image-to-Speech Synthesis Using Learned Segmental Units.
Interspeech2020
Michael Gump, Wei-Ning Hsu, James R. Glass, 
Unsupervised Methods for Evaluating Speech Representations.
Interspeech2020
Sameer Khurana, Antoine Laurent, Wei-Ning Hsu, Jan Chorowski, Adrian Lancucki, Ricard Marxer, James R. Glass, 
A Convolutional Deep Markov Model for Unsupervised Speech Representation Learning.
TASLP2022
Weiqing Wang, Qingjian Lin, Danwei Cai, Ming Li 0026, 
Similarity Measurement of Segment-Level Speaker Embeddings in Speaker Diarization.
ICASSP2022
Qingjian Li, Lin Yang, Xuyang Wang, Xiaoyi Qin, Junjie Wang, Ming Li 0026, 
Towards Lightweight Applications: Asymmetric Enroll-Verify Structure for Speaker Verification.
ICASSP2022
Weiqing Wang, Ming Li 0026, 
Incorporating End-to-End Framework Into Target-Speaker Voice Activity Detection.
ICASSP2022
Weiqing Wang, Xiaoyi Qin, Ming Li 0026, 
Cross-Channel Attention-Based Target Speaker Voice Activity Detection: Experimental Results for the M2met Challenge.
Interspeech2022
Xiaoyi Qin, Na Li 0012, Chao Weng, Dan Su 0002, Ming Li 0026, 
Cross-Age Speaker Verification: Learning Age-Invariant Speaker Embeddings.
Interspeech2022
Weiqing Wang, Ming Li 0026, Qingjian Lin, 
Online Target Speaker Voice Activity Detection for Speaker Diarization.
Interspeech2022
Xingming Wang, Xiaoyi Qin, Yikang Wang, Yunfei Xu, Ming Li 0026, 
The DKU-OPPO System for the 2022 Spoofing-Aware Speaker Verification Challenge.
ICASSP2021
Danwei Cai, Weiqing Wang, Ming Li 0026, 
An Iterative Framework for Self-Supervised Deep Speaker Representation Learning.
Interspeech2021
Yan Jia, Xingming Wang, Xiaoyi Qin, Yinping Zhang, Xuyang Wang, Junjie Wang, Dong Zhang, Ming Li 0026, 
The 2020 Personalized Voice Trigger Challenge: Open Datasets, Evaluation Metrics, Baseline System and Results.
Interspeech2021
Xiaoyi Qin, Chao Wang, Yong Ma, Min Liu, Shilei Zhang, Ming Li 0026, 
Our Learned Lessons from Cross-Lingual Speaker Verification: The CRMI-DKU System Description for the Short-Duration Speaker Verification Challenge 2021.
Interspeech2021
Yao Shi, Hui Bu, Xin Xu, Shaoji Zhang, Ming Li 0026, 
AISHELL-3: A Multi-Speaker Mandarin TTS Corpus.
Interspeech2021
Weiqing Wang, Danwei Cai, Jin Wang, Qingjian Lin, Xuyang Wang, Mi Hong, Ming Li 0026, 
The DKU-Duke-Lenovo System Description for the Fearless Steps Challenge Phase III.
Interspeech2021
Tinglong Zhu, Xiaoyi Qin, Ming Li 0026, 
Binary Neural Network for Speaker Verification.
ICASSP2020
Danwei Cai, Weicheng Cai, Ming Li 0026, 
Within-Sample Variability-Invariant Loss for Robust Speaker Recognition Under Noisy Environments.
ICASSP2020
Xiaoyi Qin, Hui Bu, Ming Li 0026, 
HI-MIA: A Far-Field Text-Dependent Speaker Verification Database and the Baselines.
Interspeech2020
Zexin Cai, Chuxiong Zhang, Ming Li 0026, 
From Speaker Verification to Multispeaker Speech Synthesis, Deep Transfer with Feedback Constraint.
Interspeech2020
Tingle Li, Qingjian Lin, Yuanyuan Bao, Ming Li 0026, 
Atss-Net: Target Speaker Separation via Attention-Based Neural Network.
Interspeech2020
Qingjian Lin, Yu Hou, Ming Li 0026, 
Self-Attentive Similarity Measurement Strategies in Speaker Diarization.
Interspeech2020
Qingjian Lin, Tingle Li, Ming Li 0026, 
The DKU Speech Activity Detection and Speaker Identification Systems for Fearless Steps Challenge Phase-02.
Interspeech2020
Xiaoyi Qin, Ming Li 0026, Hui Bu, Wei Rao, Rohan Kumar Das, Shrikanth Narayanan, Haizhou Li 0001, 
The INTERSPEECH 2020 Far-Field Speaker Verification Challenge.
ICASSP2022
Yuanchao Li, Peter Bell 0001, Catherine Lai, 
Fusing ASR Outputs in Joint Training for Speech Emotion Recognition.
Interspeech2022
Ondrej Klejch, Electra Wallington, Peter Bell 0001, 
Deciphering Speech: a Zero-Resource Approach to Cross-Lingual Transfer in ASR.
Interspeech2022
Chau Luu, Steve Renals, Peter Bell 0001, 
Investigating the contribution of speaker attributes to speaker separability using disentangled speaker representations.
Interspeech2022
Sarenne Carrol Wallbridge, Catherine Lai, Peter Bell 0001, 
Investigating perception of spoken dialogue acceptability through surprisal.
ICASSP2021
Erfan Loweimi, Zoran Cvetkovic, Peter Bell 0001, Steve Renals, 
Speech Acoustic Modelling from Raw Phase Spectrum.
ICASSP2021
Shucong Zhang, Cong-Thanh Do, Rama Doddipatla, Erfan Loweimi, Peter Bell 0001, Steve Renals, 
Train Your Classifier First: Cascade Neural Networks Training from Upper Layers to Lower Layers.
Interspeech2021
Ondrej Klejch, Electra Wallington, Peter Bell 0001, 
The CSTR System for Multilingual and Code-Switching ASR Challenges for Low Resource Indian Languages.
Interspeech2021
Erfan Loweimi, Zoran Cvetkovic, Peter Bell 0001, Steve Renals, 
Speech Acoustic Modelling Using Raw Source and Filter Components.
Interspeech2021
Chau Luu, Peter Bell 0001, Steve Renals, 
Leveraging Speaker Attribute Information Using Multi Task Learning for Speaker Verification and Diarization.
Interspeech2021
Sarenne Wallbridge, Peter Bell 0001, Catherine Lai, 
It's Not What You Said, it's How You Said it: Discriminative Perception of Speech as a Multichannel Communication System.
Interspeech2021
Electra Wallington, Benji Kershenbaum, Ondrej Klejch, Peter Bell 0001, 
On the Learning Dynamics of Semi-Supervised Training for ASR.
Interspeech2021
Shucong Zhang, Erfan Loweimi, Peter Bell 0001, Steve Renals, 
Stochastic Attention Head Removal: A Simple and Effective Method for Improving Transformer Based ASR Models.
ICASSP2020
Alberto Abad, Peter Bell 0001, Andrea Carmantini, Steve Renals, 
Cross Lingual Transfer Learning for Zero-Resource Domain Adaptation.
ICASSP2020
Chau Luu, Peter Bell 0001, Steve Renals, 
Channel Adversarial Training for Speaker Verification and Diarization.
ICASSP2020
Joanna Rownicka, Peter Bell 0001, Steve Renals, 
Multi-Scale Octave Convolutions for Robust Speech Recognition.
Interspeech2020
Neethu M. Joy, Dino Oglic, Zoran Cvetkovic, Peter Bell 0001, Steve Renals, 
Deep Scattering Power Spectrum Features for Robust Speech Recognition.
Interspeech2020
Erfan Loweimi, Peter Bell 0001, Steve Renals, 
On the Robustness and Training Dynamics of Raw Waveform Models.
Interspeech2020
Erfan Loweimi, Peter Bell 0001, Steve Renals, 
Raw Sign and Magnitude Spectra for Multi-Head Acoustic Modelling.
Interspeech2020
Dino Oglic, Zoran Cvetkovic, Peter Bell 0001, Steve Renals, 
A Deep 2D Convolutional Network for Waveform-Based Speech Recognition.
ICASSP2019
Shucong Zhang, Erfan Loweimi, Peter Bell 0001, Steve Renals, 
Windowed Attention Mechanisms for Speech Recognition.
ICASSP2022
Lisong Chen, Peilin Zhou, Yuexian Zou, 
Joint Multiple Intent Detection and Slot Filling Via Self-Distillation.
ICASSP2022
Dongchao Yang, Helin Wang, Yuexian Zou, Zhongjie Ye, Wenwu Wang 0001, 
A Mutual Learning Framework for Few-Shot Sound Event Detection.
Interspeech2022
Jinchuan Tian, Jianwei Yu, Chunlei Zhang, Yuexian Zou, Dong Yu 0001, 
LAE: Language-Aware Encoder for Monolingual and Multilingual ASR.
Interspeech2022
Helin Wang, Dongchao Yang, Chao Weng, Jianwei Yu, Yuexian Zou, 
Improving Target Sound Extraction with Timestamp Information.
Interspeech2022
Yifei Xin, Dongchao Yang, Yuexian Zou, 
Audio Pyramid Transformer with Domain Adaption for Weakly Supervised Sound Event Detection and Audio Classification.
Interspeech2022
Dongchao Yang, Helin Wang, Zhongjie Ye, Yuexian Zou, Wenwu Wang 0001, 
RaDur: A Reference-aware and Duration-robust Network for Target Sound Detection.
Interspeech2022
Zifeng Zhao, Rongzhi Gu, Dongchao Yang, Jinchuan Tian, Yuexian Zou, 
Speaker-Aware Mixture of Mixtures Training for Weakly Supervised Speaker Extraction.
Interspeech2022
Zifeng Zhao, Dongchao Yang, Rongzhi Gu, Haoran Zhang, Yuexian Zou, 
Target Confusion in End-to-end Speaker Extraction: Analysis and Approaches.
ICASSP2021
Nuo Chen, Fenglin Liu, Chenyu You, Peilin Zhou, Yuexian Zou, 
Adaptive Bi-Directional Attention: Exploring Multi-Granularity Representations for Machine Reading Comprehension.
ICASSP2021
Zhiqi Huang, Fenglin Liu, Peilin Zhou, Yuexian Zou, 
Sentiment Injected Iteratively Co-Interactive Network for Spoken Language Understanding.
ICASSP2021
Helin Wang, Yuexian Zou, Wenwu Wang 0001, 
A Global-Local Attention Framework for Weakly Labelled Audio Tagging.
ICASSP2021
Liyu Wu, Yuexian Zou, Can Zhang 0001, 
Long-Short Temporal Modeling for Efficient Action Recognition.
ICASSP2021
Haoran Zhang, Yuexian Zou, Helin Wang, 
Contrastive Self-Supervised Learning for Text-Independent Speaker Verification.
Interspeech2021
Nuo Chen, Chenyu You, Yuexian Zou, 
Self-Supervised Dialogue Learning for Spoken Conversational Question Answering.
Interspeech2021
Li Wang, Rongzhi Gu, Nuo Chen, Yuexian Zou, 
Text Anchor Based Metric Learning for Small-Footprint Keyword Spotting.
Interspeech2021
Helin Wang, Yuexian Zou, Wenwu Wang 0001, 
SpecAugment++: A Hidden Space Data Augmentation Method for Acoustic Scene Classification.
Interspeech2021
Weiyuan Xu, Peilin Zhou, Chenyu You, Yuexian Zou, 
Semantic Transportation Prototypical Network for Few-Shot Intent Detection.
Interspeech2021
Dongchao Yang, Helin Wang, Yuexian Zou, 
Unsupervised Multi-Target Domain Adaptation for Acoustic Scene Classification.
Interspeech2021
Chenyu You, Nuo Chen, Yuexian Zou, 
Contextualized Attention-Based Knowledge Transfer for Spoken Conversational Question Answering.
AAAI2021
Zhiqi Huang, Fenglin Liu, Xian Wu, Shen Ge, Helin Wang, Wei Fan 0001, Yuexian Zou, 
Audio-Oriented Multimodal Machine Comprehension via Dynamic Inter- and Intra-modality Attention.
ICASSP2022
Chang-Ting Chu, Mahdin Rohmatillah, Ching-Hsien Lee, Jen-Tzung Chien, 
Augmentation Strategy Optimization for Language Understanding.
ICASSP2022
Hou Lio, Shang-En Li, Jen-Tzung Chien, 
Adversarial Mask Transformer for Sequential Learning.
Interspeech2022
Jen-Tzung Chien, Yu-Han Huang, 
Bayesian Transformer Using Disentangled Mask Attention.
ICASSP2021
Tien-Ching Luo, Jen-Tzung Chien, 
Variational Dialogue Generation with Normalizing Flows.
Interspeech2021
Chi-Hang Leong, Yu-Han Huang, Jen-Tzung Chien, 
Online Compressive Transformer for End-to-End Speech Recognition.
Interspeech2021
Mahdin Rohmatillah, Jen-Tzung Chien, 
Causal Confusion Reduction for Robust Multi-Domain Dialogue Policy.
TASLP2020
Youzhi Tu, Man-Wai Mak, Jen-Tzung Chien, 
Variational Domain Adversarial Learning With Mutual Information Maximization for Speaker Verification.
Interspeech2020
Jen-Tzung Chien, Yu-Min Huang, 
Stochastic Convolutional Recurrent Networks for Language Modeling.
Interspeech2020
Jen-Tzung Chien, Po-Chien Hsu, 
Stochastic Curiosity Exploration for Dialogue Systems.
Interspeech2020
Weiwei Lin 0002, Man-Wai Mak, Jen-Tzung Chien, 
Strategies for End-to-End Text-Independent Speaker Verification.
ICASSP2019
Jen-Tzung Chien, Che-Yu Kuo, 
Stochastic Markov Recurrent Neural Network for Source Separation.
ICASSP2019
Jen-Tzung Chien, Chun-Wei Wang, 
Variational and Hierarchical Recurrent Autoencoder.
ICASSP2019
Wei-Wei Lin 0002, Man-Wai Mak, Youzhi Tu, Jen-Tzung Chien, 
Semi-supervised Nuisance-attribute Networks for Domain Adaptation.
Interspeech2019
Jen-Tzung Chien, Wei Xiang Lieow, 
Meta Learning for Hyperparameter Optimization in Dialogue System.
Interspeech2019
Jen-Tzung Chien, Chun-Wei Wang, 
Self Attention in Variational Sequential Learning for Summarization.
Interspeech2019
Youzhi Tu, Man-Wai Mak, Jen-Tzung Chien, 
Variational Domain Adversarial Learning for Speaker Verification.
TASLP2018
Jen-Tzung Chien, 
Bayesian Nonparametric Learning for Hierarchical and Sparse Topics.
TASLP2018
Wei-Wei Lin 0002, Man-Wai Mak, Jen-Tzung Chien, 
Multisource I-Vectors Domain Adaptation Using Maximum Mean Discrepancy Based Autoencoders.
ICASSP2018
Jen-Tzung Chien, Kuan-Ting Kuo, 
Spectro-Temporal Neural Factorization for Speech Dereverberation.
ICASSP2018
Jen-Tzung Chien, Kai-Wei Tsou, 
Recall Neural Network for Source Separation.
SpeechComm2022
Gary Yeung, Ruchao Fan, Abeer Alwan, 
Fundamental frequency feature warping for frequency normalization and data augmentation in child automatic speech recognition.
ICASSP2022
Alexander Johnson, Ruchao Fan, Robin Morris, Abeer Alwan, 
LPC Augment: an LPC-based ASR Data Augmentation Algorithm for Low and Zero-Resource Children's Dialects.
ICASSP2022
Vijay Ravi, Jinhan Wang, Jonathan Flint, Abeer Alwan, 
Fraug: A Frame Rate Based Data Augmentation Method for Depression Detection from Speech Signals.
ICASSP2022
Yunzheng Zhu, Ruchao Fan, Abeer Alwan, 
Towards Better Meta-Initialization with Task Augmentation for Kindergarten-Aged Speech Recognition.
Interspeech2022
Amber Afshan, Abeer Alwan, 
Attention-based conditioning methods using variable frame rate for style-robust speaker verification.
Interspeech2022
Amber Afshan, Abeer Alwan, 
Learning from human perception to improve automatic speaker verification in style-mismatched conditions.
Interspeech2022
Ruchao Fan, Abeer Alwan, 
DRAFT: A Novel Framework to Reduce Domain Shifting in Self-supervised Learning and Its Application to Children's ASR.
Interspeech2022
Alexander Johnson, Kevin Everson, Vijay Ravi, Anissa Gladney, Mari Ostendorf, Abeer Alwan, 
Automatic Dialect Density Estimation for African American English.
Interspeech2022
Vijay Ravi, Jinhan Wang, Jonathan Flint, Abeer Alwan, 
A Step Towards Preserving Speakers' Identity While Detecting Depression Via Speaker Disentanglement.
Interspeech2022
Jinhan Wang, Vijay Ravi, Jonathan Flint, Abeer Alwan, 
Unsupervised Instance Discriminative Learning for Depression Detection from Speech Signals.
Interspeech2021
Ruchao Fan, Wei Chu, Peng Chang 0002, Jing Xiao 0006, Abeer Alwan, 
An Improved Single Step Non-Autoregressive Transformer for Automatic Speech Recognition.
Interspeech2021
Jinhan Wang, Yunzheng Zhu, Ruchao Fan, Wei Chu, Abeer Alwan, 
Low Resource German ASR with Untranscribed Data Spoken by Non-Native Children - INTERSPEECH 2021 Shared Task SPAPL System.
Interspeech2020
Amber Afshan, Jinxi Guo, Soo Jin Park, Vijay Ravi, Alan McCree, Abeer Alwan, 
Variable Frame Rate-Based Data Augmentation to Handle Speaking-Style Variability for Automatic Speaker Verification.
Interspeech2020
Amber Afshan, Jody Kreiman, Abeer Alwan, 
Speaker Discrimination in Humans and Machines: Effects of Speaking Style Variability.
Interspeech2020
Vijay Ravi, Ruchao Fan, Amber Afshan, Huanhua Lu, Abeer Alwan, 
Exploring the Use of an Unsupervised Autoregressive Model as a Shared Encoder for Text-Dependent Speaker Verification.
Interspeech2020
Trang Tran, Morgan Tinkler, Gary Yeung, Abeer Alwan, Mari Ostendorf, 
Analysis of Disfluency in Children's Speech.
SpeechComm2019
Jinxi Guo, Ning Xu 0010, Kailun Qian, Yang Shi, Kaiyuan Xu, Yingnian Wu, Abeer Alwan, 
Deep neural network based i-vector mapping for speaker verification using short utterances.
ICASSP2019
Soo Jin Park, Amber Afshan, Jody Kreiman, Gary Yeung, Abeer Alwan, 
Target and Non-target Speaker Discrimination by Humans and Machines.
Interspeech2019
Vijay Ravi, Soo Jin Park, Amber Afshan, Abeer Alwan, 
Voice Quality and Between-Frame Entropy for Sleepiness Estimation.
Interspeech2019
Gary Yeung, Abeer Alwan, 
A Frequency Normalization Technique for Kindergarten Speech Recognition Inspired by the Role of fo in Vowel Perception.
ICASSP2022
Yongjie Lv, Longbiao Wang, Meng Ge, Sheng Li 0010, Chenchen Ding, Lixin Pan, Yuguang Wang, Jianwu Dang 0001, Kiyoshi Honda, 
Compressing Transformer-Based ASR Model by Task-Driven Loss and Attention-Based Multi-Level Feature Distillation.
ICASSP2022
Kai Wang, Yizhou Peng, Hao Huang, Ying Hu, Sheng Li 0010, 
Mining Hard Samples Locally And Globally For Improved Speech Separation.
Interspeech2022
Soky Kak, Sheng Li 0010, Masato Mimura, Chenhui Chu, Tatsuya Kawahara, 
Leveraging Simultaneous Translation for Enhancing Transcription of Low-resource Language via Cross Attention Mechanism.
Interspeech2022
Kai Li 0018, Sheng Li 0010, Xugang Lu, Masato Akagi, Meng Liu, Lin Zhang, Chang Zeng, Longbiao Wang, Jianwu Dang 0001, Masashi Unoki, 
Data Augmentation Using McAdams-Coefficient-Based Speaker Anonymization for Fake Audio Detection.
Interspeech2022
Nan Li, Meng Ge, Longbiao Wang, Masashi Unoki, Sheng Li 0010, Jianwu Dang 0001, 
Global Signal-to-noise Ratio Estimation Based on Multi-subband Processing Using Convolutional Neural Network.
Interspeech2022
Siqing Qin, Longbiao Wang, Sheng Li 0010, Yuqin Lin, Jianwu Dang 0001, 
Finer-grained Modeling units-based Meta-Learning for Low-resource Tibetan Speech Recognition.
Interspeech2022
Hao Shi, Longbiao Wang, Sheng Li 0010, Jianwu Dang 0001, Tatsuya Kawahara, 
Monaural Speech Enhancement Based on Spectrogram Decomposition for Convolutional Neural Network-sensitive Feature Extraction.
Interspeech2022
Longfei Yang, Wenqing Wei, Sheng Li 0010, Jiyi Li, Takahiro Shinozaki, 
Augmented Adversarial Self-Supervised Learning for Early-Stage Alzheimer's Speech Detection.
Interspeech2022
Zhengdong Yang, Wangjin Zhou, Chenhui Chu, Sheng Li 0010, Raj Dabre, Raphael Rubino, Yi Zhao, 
Fusion of Self-supervised Learned Models for MOS Prediction.
ICASSP2021
Shunfei Chen, Xinhui Hu, Sheng Li 0010, Xinkang Xu, 
An Investigation of Using Hybrid Modeling Units for Improving End-to-End Speech Recognition System.
ICASSP2021
Nan Li, Longbiao Wang, Masashi Unoki, Sheng Li 0010, Rui Wang, Meng Ge, Jianwu Dang 0001, 
Robust Voice Activity Detection Using a Masked Auditory Encoder Based Convolutional Neural Network.
Interspeech2021
Kai Wang, Hao Huang, Ying Hu, Zhihua Huang, Sheng Li 0010, 
End-to-End Speech Separation Using Orthogonal Representation in Complex and Real Time-Frequency Domain.
Interspeech2021
Ding Wang, Shuaishuai Ye, Xinhui Hu, Sheng Li 0010, Xinkang Xu, 
An End-to-End Dialect Identification System with Transfer Learning from a Multilingual Automatic Speech Recognition Model.
TASLP2020
Peng Shen, Xugang Lu, Sheng Li 0010, Hisashi Kawai, 
Knowledge Distillation-Based Representation Learning for Short-Utterance Spoken Language Identification.
ICASSP2020
Yuqin Lin, Longbiao Wang, Jianwu Dang 0001, Sheng Li 0010, Chenchen Ding, 
End-to-End Articulatory Modeling for Dysarthric Articulatory Attribute Detection.
ICASSP2020
Hao Shi, Longbiao Wang, Meng Ge, Sheng Li 0010, Jianwu Dang 0001, 
Spectrograms Fusion with Minimum Difference Masks Estimation for Monaural Speech Dereverberation.
Interspeech2020
Yuqin Lin, Longbiao Wang, Sheng Li 0010, Jianwu Dang 0001, Chenchen Ding, 
Staged Knowledge Distillation for End-to-End Dysarthric Speech Recognition and Speech Attribute Transcription.
Interspeech2020
Hao Shi, Longbiao Wang, Sheng Li 0010, Chenchen Ding, Meng Ge, Nan Li, Jianwu Dang 0001, Hiroshi Seki, 
Singing Voice Extraction with Attention-Based Spectrograms Fusion.
ICASSP2019
Peng Shen, Xugang Lu, Sheng Li 0010, Hisashi Kawai, 
Interactive Learning of Teacher-student Model for Short Utterance Spoken Language Identification.
Interspeech2019
Sheng Li 0010, Chenchen Ding, Xugang Lu, Peng Shen, Tatsuya Kawahara, Hisashi Kawai, 
End-to-End Articulatory Attribute Modeling for Low-Resource Multilingual Speech Recognition.
Interspeech2021
Vikramjit Mitra, Zifang Huang, Colin Lea, Lauren Tooley, Sarah Wu, Darren Botten, Ashwini Palekar, Shrinath Thelapurath, Panayiotis G. Georgiou, Sachin Kajarekar, Jeffrey P. Bigham, 
Analysis and Tuning of a Voice Assistant System for Dysfluent Speech.
ICASSP2020
Sandeep Nallan Chakravarthula, Md. Nasir, Shao-Yen Tseng, Haoqi Li, Tae Jin Park, Brian R. Baucom, Craig J. Bryan, Shrikanth Narayanan, Panayiotis G. Georgiou, 
Automatic Prediction of Suicidal Risk in Military Couples Using Multimodal Interaction Cues from Couples Conversations.
ICASSP2020
Haoqi Li, Ming Tu, Jing Huang 0019, Shrikanth Narayanan, Panayiotis G. Georgiou, 
Speaker-Invariant Affective Representation Learning via Adversarial Training.
ICASSP2019
Nikolaos Flemotomos, Panayiotis G. Georgiou, David C. Atkins, Shrikanth S. Narayanan, 
Role Specific Lattice Rescoring for Speaker Role Recognition from Speech Recognition Outputs.
Interspeech2019
Sandeep Nallan Chakravarthula, Haoqi Li, Shao-Yen Tseng, Maija Reblin, Panayiotis G. Georgiou, 
Predicting Behavior in Cancer-Afflicted Patient and Spouse Interactions Using Speech and Language.
Interspeech2019
Arindam Jati, Raghuveer Peri, Monisankha Pal, Tae Jin Park, Naveen Kumar 0004, Ruchir Travadi, Panayiotis G. Georgiou, Shrikanth Narayanan, 
Multi-Task Discriminative Training of Hybrid DNN-TVM Model for Speaker Verification with Noisy and Far-Field Speech.
Interspeech2019
Md. Nasir, Sandeep Nallan Chakravarthula, Brian R. W. Baucom, David C. Atkins, Panayiotis G. Georgiou, Shrikanth Narayanan, 
Modeling Interpersonal Linguistic Coordination in Conversations Using Word Mover's Distance.
Interspeech2019
Tae Jin Park, Manoj Kumar 0007, Nikolaos Flemotomos, Monisankha Pal, Raghuveer Peri, Rimita Lahiri, Panayiotis G. Georgiou, Shrikanth Narayanan, 
The Second DIHARD Challenge: System Description for USC-SAIL Team.
Interspeech2019
Tae Jin Park, Kyu J. Han, Jing Huang 0019, Xiaodong He 0001, Bowen Zhou, Panayiotis G. Georgiou, Shrikanth Narayanan, 
Speaker Diarization with Lexical Information.
Interspeech2019
Prashanth Gurunath Shivakumar, Mu Yang, Panayiotis G. Georgiou, 
Spoken Language Intent Detection Using Confusion2Vec.
Interspeech2019
Krishna Somandepalli, Naveen Kumar 0004, Arindam Jati, Panayiotis G. Georgiou, Shrikanth Narayanan, 
Multiview Shared Subspace Learning Across Speakers and Speech Commands.
ICASSP2018
Arindam Jati, Paula G. Williams, Brian R. Baucom, Panayiotis G. Georgiou, 
Towards Predicting Physiology from Speech During Stressful Conversations: Heart Rate and Respiratory Sinus Arrhythmia.
Interspeech2018
Sandeep Nallan Chakravarthula, Brian R. Baucom, Panayiotis G. Georgiou, 
Modeling Interpersonal Influence of Verbal Behavior in Couples Therapy Dyadic Interactions.
Interspeech2018
Arindam Jati, Panayiotis G. Georgiou, 
An Unsupervised Neural Prediction Framework for Learning Speaker Embeddings Using Recurrent Neural Networks.
Interspeech2018
Md. Nasir, Brian R. Baucom, Shrikanth S. Narayanan, Panayiotis G. Georgiou, 
Towards an Unsupervised Entrainment Distance in Conversational Speech Using Deep Neural Networks.
Interspeech2018
Tae Jin Park, Panayiotis G. Georgiou, 
Multimodal Speaker Segmentation and Diarization Using Lexical and Acoustic Cues via Sequence to Sequence Neural Networks.
ICASSP2017
Haoqi Li, Brian R. Baucom, Panayiotis G. Georgiou, 
Unsupervised latent behavior manifold learning from acoustic features: Audio2behavior.
Interspeech2017
James Gibson, Dogan Can, Panayiotis G. Georgiou, David C. Atkins, Shrikanth S. Narayanan, 
Attention Networks for Modeling Behaviors in Addiction Counseling.
Interspeech2017
Arindam Jati, Panayiotis G. Georgiou, 
Speaker2Vec: Unsupervised Learning and Adaptation of a Speaker Manifold Using Deep Neural Networks with an Evaluation on Speaker Segmentation.
Interspeech2017
Karel Mundnich, Md. Nasir, Panayiotis G. Georgiou, Shrikanth S. Narayanan, 
Exploiting Intra-Annotator Rating Consistency Through Copeland's Method for Estimation of Ground Truth Labels in Couples' Therapy.
Interspeech2022
Salvatore Fara, Stefano Goria, Emilia Molimpakis, Nicholas Cummins, 
Speech and the n-Back task as a lens into depression. How combining both may allow us to isolate different core symptoms of depression.
Interspeech2022
Bahman Mirheidari, André Bittar, Nicholas Cummins, Johnny Downs, Helen L. Fisher, Heidi Christensen, 
Automatic Detection of Expressed Emotion from Five-Minute Speech Samples: Challenges and Opportunities.
TASLP2021
Kazi Nazmul Haque, Rajib Rana, Jiajun Liu, John H. L. Hansen, Nicholas Cummins, Carlos Busso, Björn W. Schuller, 
Guided Generative Adversarial Neural Network for Representation Learning and Audio Generation Using Fewer Labelled Audio Data.
ICASSP2021
Chao Li, Boyang Chen, Ziping Zhao 0001, Nicholas Cummins, Björn W. Schuller, 
Hierarchical Attention-Based Temporal Convolutional Networks for Eeg-Based Emotion Recognition.
Interspeech2021
Judith Dineley, Grace Lavelle, Daniel Leightley, Faith Matcham, Sara Siddi, Maria Teresa Peñarrubia-María, Katie M. White, Alina Ivan, Carolin Oetzmann, Sara Simblett, Erin Dawe-Lane, Stuart Bruce, Daniel Stahl, Yatharth Ranjan, Zulqarnain Rashid, Pauline Conde, Amos A. Folarin, Josep Maria Haro, Til Wykes, Richard J. B. Dobson, Vaibhav A. Narayan, Matthew Hotopf, Björn W. Schuller, Nicholas Cummins, RADAR-CNS Consortium, 
Remote Smartphone-Based Speech Collection: Acceptance and Barriers in Individuals with Major Depressive Disorder.
ICASSP2020
Ziping Zhao 0001, Zhongtian Bao, Zixing Zhang 0001, Nicholas Cummins, Haishuai Wang, Björn W. Schuller, 
Hierarchical Attention Transfer Networks for Depression Assessment from Speech.
Interspeech2020
Merlin Albes, Zhao Ren, Björn W. Schuller, Nicholas Cummins, 
Squeeze for Sneeze: Compact Neural Networks for Cold and Flu Recognition.
Interspeech2020
Alice Baird, Nicholas Cummins, Sebastian Schnieder, Jarek Krajewski, Björn W. Schuller, 
An Evaluation of the Effect of Anxiety on Speech - Computational Prediction of Anxiety from Sustained Vowels.
Interspeech2020
Nicholas Cummins, Yilin Pan, Zhao Ren, Julian Fritsch, Venkata Srikanth Nallanthighal, Heidi Christensen, Daniel Blackburn, Björn W. Schuller, Mathew Magimai-Doss, Helmer Strik, Aki Härmä, 
A Comparison of Acoustic and Linguistics Methodologies for Alzheimer's Dementia Recognition.
Interspeech2020
Adria Mallol-Ragolta, Nicholas Cummins, Björn W. Schuller, 
An Investigation of Cross-Cultural Semi-Supervised Learning for Continuous Affect Recognition.
Interspeech2020
Zhao Ren, Jing Han 0010, Nicholas Cummins, Björn W. Schuller, 
Enhancing Transferability of Black-Box Adversarial Attacks via Lifelong Learning for Speech Emotion Recognition Models.
Interspeech2020
Ziping Zhao 0001, Qifei Li, Nicholas Cummins, Bin Liu 0041, Haishuai Wang, Jianhua Tao, Björn W. Schuller, 
Hybrid Network Feature Extraction for Depression Assessment from Speech.
Interspeech2019
Alice Baird, Shahin Amiriparian, Nicholas Cummins, Sarah Sturmbauer, Johanna Janson, Eva-Maria Meßner, Harald Baumeister, Nicolas Rohleder, Björn W. Schuller, 
Using Speech to Predict Sequentially Measured Cortisol Levels During a Trier Social Stress Test.
Interspeech2019
Adria Mallol-Ragolta, Ziping Zhao 0001, Lukas Stappen, Nicholas Cummins, Björn W. Schuller, 
A Hierarchical Attention Network-Based Approach for Depression Detection from Transcribed Clinical Interviews.
Interspeech2019
Maximilian Schmitt, Nicholas Cummins, Björn W. Schuller, 
Continuous Emotion Recognition in Speech - Do We Need Recurrence?
Interspeech2019
Xinzhou Xu, Jun Deng, Nicholas Cummins, Zixing Zhang 0001, Li Zhao 0003, Björn W. Schuller, 
Autonomous Emotion Learning in Speech: A View of Zero-Shot Speech Emotion Recognition.
Interspeech2019
Ziping Zhao 0001, Zhongtian Bao, Zixing Zhang 0001, Nicholas Cummins, Haishuai Wang, Björn W. Schuller, 
Attention-Enhanced Connectionist Temporal Classification for Discrete Speech Emotion Recognition.
Interspeech2018
Shahin Amiriparian, Alice Baird, Sahib Julka, Alyssa Alcorn, Sandra Ottl, Suncica Petrovic, Eloise Ainger, Nicholas Cummins, Björn W. Schuller, 
Recognition of Echolalic Autistic Child Vocalisations Utilising Convolutional Recurrent Neural Networks.
Interspeech2018
Alice Baird, Emilia Parada-Cabaleiro, Simone Hantke, Felix Burkhardt, Nicholas Cummins, Björn W. Schuller, 
The Perception and Analysis of the Likeability and Human Likeness of Synthesized Speech.
Interspeech2018
Eva-Maria Rathner, Julia Djamali, Yannik Terhorst, Björn W. Schuller, Nicholas Cummins, Gudrun Salamon, Christina Hunger-Schoppe, Harald Baumeister, 
How Did You like 2017? Detection of Language Markers of Depression and Narcissism in Personal Narratives.
TASLP2022
Brij Mohan Lal Srivastava, Mohamed Maouche, Md. Sahidullah, Emmanuel Vincent 0001, Aurélien Bellet, Marc Tommasi, Natalia A. Tomashenko, Xin Wang 0037, Junichi Yamagishi, 
Privacy and Utility of X-Vector Based Speaker Anonymization.
ICASSP2022
Michel Olvera, Emmanuel Vincent 0001, Gilles Gasso, 
On The Impact of Normalization Strategies in Unsupervised Adversarial Domain Adaptation for Acoustic Scene Classification.
Interspeech2022
Mohamed Maouche, Brij Mohan Lal Srivastava, Nathalie Vauquier, Aurélien Bellet, Marc Tommasi, Emmanuel Vincent 0001, 
Enhancing Speech Privacy with Slicing.
Interspeech2021
Sunit Sivasankaran, Emmanuel Vincent 0001, Dominique Fohr, 
Explaining Deep Learning Models for Speech Enhancement.
ICASSP2020
Sunit Sivasankaran, Emmanuel Vincent 0001, Dominique Fohr, 
SLOGD: Speaker Location Guided Deflation Approach to Speech Separation.
ICASSP2020
Brij Mohan Lal Srivastava, Nathalie Vauquier, Md. Sahidullah, Aurélien Bellet, Marc Tommasi, Emmanuel Vincent 0001, 
Evaluating Voice Conversion-Based Privacy Protection against Informed Attackers.
ICASSP2020
Nicolas Turpault, Romain Serizel, Emmanuel Vincent 0001, 
Limitations of Weak Labels for Embedding and Tagging.
Interspeech2020
Samuele Cornell, Maurizio Omologo, Stefano Squartini, Emmanuel Vincent 0001, 
Detecting and Counting Overlapping Speakers in Distant Speech Scenarios.
Interspeech2020
Mathieu Hu, Laurent Pierron, Emmanuel Vincent 0001, Denis Jouvet, 
Kaldi-Web: An Installation-Free, On-Device Speech Recognition System.
Interspeech2020
Mohamed Maouche, Brij Mohan Lal Srivastava, Nathalie Vauquier, Aurélien Bellet, Marc Tommasi, Emmanuel Vincent 0001, 
A Comparative Study of Speech Anonymization Metrics.
Interspeech2020
Manuel Pariente, Samuele Cornell, Joris Cosentino, Sunit Sivasankaran, Efthymios Tzinis, Jens Heitkaemper, Michel Olvera, Fabian-Robert Stöter, Mathieu Hu, Juan M. Martín-Doñas, David Ditter, Ariel Frank, Antoine Deleforge, Emmanuel Vincent 0001, 
Asteroid: The PyTorch-Based Audio Source Separation Toolkit for Researchers.
Interspeech2020
Imran A. Sheikh, Emmanuel Vincent 0001, Irina Illina, 
On Semi-Supervised LF-MMI Training of Acoustic Models with Limited Data.
Interspeech2020
Brij Mohan Lal Srivastava, Natalia A. Tomashenko, Xin Wang 0037, Emmanuel Vincent 0001, Junichi Yamagishi, Mohamed Maouche, Aurélien Bellet, Marc Tommasi, 
Design Choices for X-Vector Based Speaker Anonymization.
Interspeech2020
Natalia A. Tomashenko, Brij Mohan Lal Srivastava, Xin Wang 0037, Emmanuel Vincent 0001, Andreas Nautsch, Junichi Yamagishi, Nicholas W. D. Evans, Jose Patino 0001, Jean-François Bonastre, Paul-Gauthier Noé, Massimiliano Todisco, 
Introducing the VoicePrivacy Initiative.
Interspeech2020
M. A. Tugtekin Turan, Emmanuel Vincent 0001, Denis Jouvet, 
Achieving Multi-Accent ASR via Unsupervised Acoustic Model Adaptation.
SpeechComm2019
Nancy Bertin, Ewen Camberlein, Romain Lebarbenchon, Emmanuel Vincent 0001, Sunit Sivasankaran, Irina Illina, Frédéric Bimbot, 
VoiceHome-2, an extended corpus for multichannel speech processing in real homes.
TASLP2019
Annamaria Mesaros, Aleksandr Diment, Benjamin Elizalde, Toni Heittola, Emmanuel Vincent 0001, Bhiksha Raj, Tuomas Virtanen, 
Sound Event Detection in the DCASE 2017 Challenge.
ICASSP2019
Dayana Ribas, Emmanuel Vincent 0001, 
An Improved Uncertainty Propagation Method for Robust I-vector Based Speaker Recognition.
Interspeech2019
Manuel Pariente, Antoine Deleforge, Emmanuel Vincent 0001, 
A Statistically Principled and Computationally Efficient Approach to Speech Enhancement Using Variational Autoencoders.
Interspeech2019
Brij Mohan Lal Srivastava, Aurélien Bellet, Marc Tommasi, Emmanuel Vincent 0001, 
Privacy-Preserving Adversarial Representation Learning in ASR: Reality or Illusion?
ICASSP2022
Jingbei Li, Yi Meng, Chenyi Li, Zhiyong Wu 0001, Helen Meng, Chao Weng, Dan Su 0002, 
Enhancing Speaking Styles in Conversational Text-to-Speech Synthesis with Graph-Based Multi-Modal Context Modeling.
ICASSP2022
Brian Yan, Chunlei Zhang, Meng Yu 0003, Shi-Xiong Zhang, Siddharth Dalmia, Dan Berrebbi, Chao Weng, Shinji Watanabe 0001, Dong Yu 0001, 
Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization.
ICASSP2022
Chunlei Zhang, Jiatong Shi, Chao Weng, Meng Yu 0003, Dong Yu 0001, 
Towards end-to-end Speaker Diarization with Generalized Neural Speaker Clustering.
ICASSP2022
Naijun Zheng, Na Li 0012, Xixin Wu, Lingwei Meng, Jiawen Kang, Haibin Wu, Chao Weng, Dan Su 0002, Helen Meng, 
The CUHK-Tencent Speaker Diarization System for the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Challenge.
ICASSP2022
Naijun Zheng, Na Li 0012, Jianwei Yu, Chao Weng, Dan Su 0002, Xunying Liu, Helen Meng, 
Multi-Channel Speaker Diarization Using Spatial Features for Meetings.
Interspeech2022
Xiaoyi Qin, Na Li 0012, Chao Weng, Dan Su 0002, Ming Li 0026, 
Cross-Age Speaker Verification: Learning Age-Invariant Speaker Embeddings.
Interspeech2022
Helin Wang, Dongchao Yang, Chao Weng, Jianwei Yu, Yuexian Zou, 
Improving Target Sound Extraction with Timestamp Information.
ICASSP2021
Xu Li, Na Li 0012, Chao Weng, Xunying Liu, Dan Su 0002, Dong Yu 0001, Helen Meng, 
Replay and Synthetic Speech Detection with Res2Net Architecture.
ICASSP2021
Jiatong Shi, Chunlei Zhang, Chao Weng, Shinji Watanabe 0001, Meng Yu 0003, Dong Yu 0001, 
Improving RNN Transducer with Target Speaker Extraction and Neural Uncertainty Estimation.
ICASSP2021
Xingchen Song, Zhiyong Wu 0001, Yiheng Huang, Chao Weng, Dan Su 0002, Helen M. Meng, 
Non-Autoregressive Transformer ASR with CTC-Enhanced Decoder Input.
ICASSP2021
Aswin Shanmugam Subramanian, Chao Weng, Shinji Watanabe 0001, Meng Yu 0003, Yong Xu 0004, Shi-Xiong Zhang, Dong Yu 0001, 
Directional ASR: A New Paradigm for E2E Multi-Speaker Speech Recognition with Source Localization.
ICASSP2021
Wei Xia, Chunlei Zhang, Chao Weng, Meng Yu 0003, Dong Yu 0001, 
Self-Supervised Text-Independent Speaker Verification Using Prototypical Momentum Contrastive Learning.
ICASSP2021
Chunlei Zhang, Meng Yu 0003, Chao Weng, Dong Yu 0001, 
Towards Robust Speaker Verification with Target Speaker Enhancement.
ICASSP2021
Naijun Zheng, Na Li 0012, Bo Wu, Meng Yu 0003, Jianwei Yu, Chao Weng, Dan Su 0002, Xunying Liu, Helen Meng, 
A Joint Training Framework of Multi-Look Separator and Speaker Embedding Extractor for Overlapped Speech.
Interspeech2021
Guoguo Chen, Shuzhou Chai, Guan-Bo Wang, Jiayu Du, Wei-Qiang Zhang, Chao Weng, Dan Su 0002, Daniel Povey, Jan Trmal, Junbo Zhang, Mingjie Jin, Sanjeev Khudanpur, Shinji Watanabe 0001, Shuaijiang Zhao, Wei Zou, Xiangang Li, Xuchen Yao, Yongqing Wang, Zhao You, Zhiyong Yan, 
GigaSpeech: An Evolving, Multi-Domain ASR Corpus with 10, 000 Hours of Transcribed Audio.
Interspeech2021
Max W. Y. Lam, Jun Wang 0091, Chao Weng, Dan Su 0002, Dong Yu 0001, 
Raw Waveform Encoder with Multi-Scale Globally Attentive Locally Recurrent Networks for End-to-End Speech Recognition.
Interspeech2021
Helin Wang, Bo Wu, Lianwu Chen, Meng Yu 0003, Jianwei Yu, Yong Xu 0004, Shi-Xiong Zhang, Chao Weng, Dan Su 0002, Dong Yu 0001, 
TeCANet: Temporal-Contextual Attention Network for Environment-Aware Speech Dereverberation.
ICASSP2020
Chengqi Deng, Chengzhu Yu, Heng Lu 0004, Chao Weng, Dong Yu 0001, 
Pitchnet: Unsupervised Singing Voice Conversion with Pitch Adversarial Network.
ICASSP2020
Aswin Shanmugam Subramanian, Chao Weng, Meng Yu 0003, Shi-Xiong Zhang, Yong Xu 0004, Shinji Watanabe 0001, Dong Yu 0001, 
Far-Field Location Guided Target Speech Extraction Using End-to-End Speech Recognition Objectives.
ICASSP2020
Zhao You, Dan Su 0002, Jie Chen 0057, Chao Weng, Dong Yu 0001, 
Dfsmn-San with Persistent Memory Model for Automatic Speech Recognition.
TASLP2022
Shoukang Hu, Xurong Xie, Mingyu Cui, Jiajun Deng, Shansong Liu, Jianwei Yu, Mengzhe Geng, Xunying Liu, Helen Meng, 
Neural Architecture Search for LF-MMI Trained Time Delay Neural Networks.
TASLP2022
Boyang Xue, Shoukang Hu, Junhao Xu, Mengzhe Geng, Xunying Liu, Helen Meng, 
Bayesian Neural Network Language Modeling for Speech Recognition.
ICASSP2022
Xixin Wu, Shoukang Hu, Zhiyong Wu 0001, Xunying Liu, Helen Meng, 
Neural Architecture Search for Speech Emotion Recognition.
Interspeech2022
Mingyu Cui, Jiajun Deng, Shoukang Hu, Xurong Xie, Tianzi Wang, Shujie Hu, Mengzhe Geng, Boyang Xue, Xunying Liu, Helen Meng, 
Two-pass Decoding and Cross-adaptation Based System Combination of End-to-end Conformer and Hybrid TDNN ASR Systems.
Interspeech2022
Tianzi Wang, Jiajun Deng, Mengzhe Geng, Zi Ye, Shoukang Hu, Yi Wang, Mingyu Cui, Zengrui Jin, Xunying Liu, Helen Meng, 
Conformer Based Elderly Speech Recognition System for Alzheimer's Disease Detection.
Interspeech2022
Yi Wang, Tianzi Wang, Zi Ye, Lingwei Meng, Shoukang Hu, Xixin Wu, Xunying Liu, Helen Meng, 
Exploring linguistic feature and model combination for speech recognition based automatic AD detection.
Interspeech2022
Junhao Xu, Shoukang Hu, Xunying Liu, Helen Meng, 
Towards Green ASR: Lossless 4-bit Quantization of a Hybrid TDNN System on the 300-hr Swithboard Corpus.
TASLP2021
Shoukang Hu, Xurong Xie, Shansong Liu, Jianwei Yu, Zi Ye, Mengzhe Geng, Xunying Liu, Helen Meng, 
Bayesian Learning of LF-MMI Trained Time Delay Neural Networks for Speech Recognition.
TASLP2021
Shansong Liu, Mengzhe Geng, Shoukang Hu, Xurong Xie, Mingyu Cui, Jianwei Yu, Xunying Liu, Helen Meng, 
Recent Progress in the CUHK Dysarthric Speech Recognition System.
TASLP2021
Junhao Xu, Jianwei Yu, Shoukang Hu, Xunying Liu, Helen Meng, 
Mixed Precision Low-Bit Quantization of Neural Network Language Models for Speech Recognition.
TASLP2021
Jianwei Yu, Shi-Xiong Zhang, Bo Wu, Shansong Liu, Shoukang Hu, Mengzhe Geng, Xunying Liu, Helen Meng, Dong Yu 0001, 
Audio-Visual Multi-Channel Integration and Recognition of Overlapped Speech.
ICASSP2021
Shoukang Hu, Xurong Xie, Shansong Liu, Mingyu Cui, Mengzhe Geng, Xunying Liu, Helen Meng, 
Neural Architecture Search for LF-MMI Trained Time Delay Neural Networks.
ICASSP2021
Junhao Xu, Shoukang Hu, Jianwei Yu, Xunying Liu, Helen Meng, 
Mixed Precision Quantization of Transformer Language Models for Speech Recognition.
ICASSP2021
Zi Ye, Shoukang Hu, Jinchao Li, Xurong Xie, Mengzhe Geng, Jianwei Yu, Junhao Xu, Boyang Xue, Shansong Liu, Xunying Liu, Helen Meng, 
Development of the Cuhk Elderly Speech Recognition System for Neurocognitive Disorder Detection Using the Dementiabank Corpus.
Interspeech2021
Jiajun Deng, Fabian Ritter Gutierrez, Shoukang Hu, Mengzhe Geng, Xurong Xie, Zi Ye, Shansong Liu, Jianwei Yu, Xunying Liu, Helen Meng, 
Bayesian Parametric and Architectural Domain Adaptation of LF-MMI Trained TDNNs for Elderly and Dysarthric Speech Recognition.
Interspeech2021
Mengzhe Geng, Shansong Liu, Jianwei Yu, Xurong Xie, Shoukang Hu, Zi Ye, Zengrui Jin, Xunying Liu, Helen Meng, 
Spectro-Temporal Deep Features for Disordered Speech Assessment and Recognition.
ICASSP2020
Junhao Xu, Xie Chen 0001, Shoukang Hu, Jianwei Yu, Xunying Liu, Helen Meng, 
Low-bit Quantization of Recurrent Neural Network Language Models Using Alternating Direction Methods of Multipliers.
Interspeech2020
Mengzhe Geng, Xurong Xie, Shansong Liu, Jianwei Yu, Shoukang Hu, Xunying Liu, Helen Meng, 
Investigation of Data Augmentation Techniques for Disordered Speech Recognition.
Interspeech2020
Shansong Liu, Xurong Xie, Jianwei Yu, Shoukang Hu, Mengzhe Geng, Rongfeng Su, Shi-Xiong Zhang, Xunying Liu, Helen Meng, 
Exploiting Cross-Domain Visual Feature Generation for Disordered Speech Recognition.
ICASSP2019
Shoukang Hu, Max W. Y. Lam, Xurong Xie, Shansong Liu, Jianwei Yu, Xixin Wu, Xunying Liu, Helen Meng, 
Bayesian and Gaussian Process Neural Networks for Large Vocabulary Continuous Speech Recognition.
TASLP2021
Nauman Dawalatabad, Srikanth R. Madikeri, C. Chandra Sekhar, Hema A. Murthy, 
Novel Architectures for Unsupervised Information Bottleneck Based Speaker Diarization of Meetings.
Interspeech2021
Mari Ganesh Kumar, Jom Kuriakose, Anand Thyagachandran, Arun Kumar A, Ashish Seth, Lodagala Durga Prasad, Saish Jaiswal, Anusha Prakash 0001, Hema A. Murthy, 
Dual Script E2E Framework for Multilingual and Code-Switching ASR.
SpeechComm2020
Arun Baby, Jeena J. Prakash, Aswin Shanmugam Subramanian, Hema A. Murthy, 
Significance of spectral cues in automatic speech segmentation for Indian language speech synthesizers.
ICASSP2020
Venkata Subramanian Viraraghavan, Arpan Pal 0001, Hema A. Murthy, Rangarajan Aravind, 
State-Based Transcription of Components of Carnatic Music.
Interspeech2020
Mano Ranjith Kumar M., Sudhanshu Srivastava, Anusha Prakash 0001, Hema A. Murthy, 
A Hybrid HMM-Waveglow Based Text-to-Speech Synthesizer Using Histogram Equalization for Low Resource Indian Languages.
Interspeech2020
Anusha Prakash 0001, Hema A. Murthy, 
Generic Indic Text-to-Speech Synthesisers with Rapid Adaptation in an End-to-End Framework.
Interspeech2020
Karthik Pandia D. S, Anusha Prakash 0001, Mano Ranjith Kumar M., Hema A. Murthy, 
Exploration of End-to-End Synthesisers for Zero Resource Speech Challenge 2020.
Interspeech2020
Rini A. Sharon, Hema A. Murthy, 
The "Sound of Silence" in EEG - Cognitive Voice Activity Detection.
ICASSP2019
Nauman Dawalatabad, Srikanth R. Madikeri, C. Chandra Sekhar, Hema A. Murthy, 
Incremental Transfer Learning in Two-pass Information Bottleneck Based Speaker Diarization System for Meetings.
ICASSP2019
Rini A. Sharon, Shrikanth S. Narayanan, Mriganka Sur, Hema A. Murthy, 
An Empirical Study of Speech Processing in the Brain by Analyzing the Temporal Syllable Structure in Speech-input Induced EEG.
Interspeech2019
Karthik Pandia D. S, Hema A. Murthy, 
Zero Resource Speech Synthesis Using Transcripts Derived from Perceptual Acoustic Units.
Interspeech2018
Nauman Dawalatabad, Jom Kuriakose, Chellu Chandra Sekhar, Hema A. Murthy, 
Information Bottleneck Based Percussion Instrument Diarization System for Taniavartanam Segments of Carnatic Music Concerts.
Interspeech2018
Gayathri G, N. Mohana, Radhika Pal, Hema A. Murthy, 
Mobile Application for Learning Languages for the Unlettered.
Interspeech2018
G. R. Kasthuri, Prabha Ramanathan, Hema A. Murthy, Namita Jacob, Anil Prabhakar, 
Early Vocabulary Development Through Picture-based Software Solutions.
Interspeech2018
Mahesh M, Jeena J. Prakash, Hema A. Murthy, 
Resyllabification in Indian Languages and Its Implications in Text-to-speech Systems.
Interspeech2018
Srihari Maruthachalam, Sidharth Aggarwal, Mari Ganesh Kumar, Mriganka Sur, Hema A. Murthy, 
Brain-Computer Interface using Electroencephalogram Signatures of Eye Blinks.
Interspeech2018
Jeena J. Prakash, Rajan Golda Brunet, Hema A. Murthy, 
Transcription Correction for Indian Languages Using Acoustic Signatures.
Interspeech2018
M. S. Saranya, Hema A. Murthy, 
Decision-level Feature Switching as a Paradigm for Replay Attack Detection.
Interspeech2018
Jilt Sebastian, Manoj Kumar 0007, Pavan Kumar D. S., Mathew Magimai-Doss, Hema A. Murthy, Shrikanth S. Narayanan, 
Denoising and Raw-waveform Networks for Weakly-Supervised Gender Identification on Noisy Speech.
Interspeech2018
Anju Leela Thomas, Anusha Prakash 0001, Arun Baby, Hema A. Murthy, 
Code-switching in Indic Speech Synthesisers.
TASLP2022
Yonggang Hu, Prasanga N. Samarasinghe, Sharon Gannot, Thushara D. Abhayapala, 
Decoupled Multiple Speaker Direction-of-Arrival Estimator Under Reverberant Environments.
TASLP2021
Dovid Y. Levin, Shmulik Markovich-Golan, Sharon Gannot, 
Near-Field Superdirectivity: An Analytical Perspective.
Interspeech2021
Aviad Eisenberg, Boaz Schwartz, Sharon Gannot, 
Online Blind Audio Source Separation Using Recursive Expectation-Maximization.
Interspeech2021
Yochai Yemini, Ethan Fetaya, Haggai Maron, Sharon Gannot, 
Scene-Agnostic Multi-Microphone Speech Dereverberation.
TASLP2020
Dani Cherkassky, Sharon Gannot, 
Successive Relative Transfer Function Identification Using Blind Oblique Projection.
TASLP2020
Yonggang Hu, Prasanga N. Samarasinghe, Sharon Gannot, Thushara D. Abhayapala, 
Semi-Supervised Multiple Source Localization Using Relative Harmonic Coefficients Under Noisy and Reverberant Environments.
TASLP2020
Bracha Laufer-Goldshtein, Ronen Talmon, Sharon Gannot, 
Global and Local Simplex Representations for Multichannel Source Separation.
TASLP2020
Yaron Laufer, Bracha Laufer-Goldshtein, Sharon Gannot, 
ML Estimation and CRBs for Reverberation, Speech, and Noise PSDs in Rank-Deficient Noise Field.
TASLP2020
Koby Weisberg, Bracha Laufer-Goldshtein, Sharon Gannot, 
Simultaneous Tracking and Separation of Multiple Sources Using Factor Graph Model.
ICASSP2020
Elior Hadad, Sharon Gannot, 
Maximum Likelihood Multi-Speaker Direction of Arrival Estimation Utilizing a Weighted Histogram.
ICASSP2020
Yonggang Hu, Prasanga N. Samarasinghe, Thushara D. Abhayapala, Sharon Gannot, 
Unsupervised Multiple Source Localization Using Relative Harmonic Coefficients.
ICASSP2020
Yaniv Opochinsky, Shlomo E. Chazan, Sharon Gannot, Jacob Goldberger, 
K-Autoencoders Deep Clustering.
ICASSP2020
Yochai Yemini, Shlomo E. Chazan, Jacob Goldberger, Sharon Gannot, 
A Composite DNN Architecture for Speech Enhancement.
TASLP2019
Yaron Laufer, Sharon Gannot, 
A Bayesian Hierarchical Model for Speech Enhancement With Time-Varying Audio Channel.
TASLP2019
Xiaofei Li 0001, Laurent Girin, Sharon Gannot, Radu Horaud, 
Multichannel Speech Separation and Enhancement Using the Convolutive Transfer Function.
ICASSP2019
Andreas Brendel, Bracha Laufer-Goldshtein, Sharon Gannot, Ronen Talmon, Walter Kellermann, 
Localization of an Unknown Number of Speakers in Adverse Acoustic Conditions Using Reliability Information and Diarization.
TASLP2018
Sebastian Braun, Adam Kuklasinski, Ofer Schwartz, Oliver Thiergart, Emanuël A. P. Habets, Sharon Gannot, Simon Doclo, Jesper Jensen 0001, 
Evaluation and Comparison of Late Reverberation Power Spectral Density Estimators.
TASLP2018
Bracha Laufer-Goldshtein, Ronen Talmon, Sharon Gannot, 
A Hybrid Approach for Speaker Tracking Based on TDOA and Data-Driven Models.
TASLP2018
Xiaofei Li 0001, Sharon Gannot, Laurent Girin, Radu Horaud, 
Multichannel Identification and Nonnegative Equalization for Dereverberation and Noise Reduction Based on Convolutive Transfer Function.
ICASSP2018
Bracha Laufer-Goldshtein, Ronen Talmon, Israel Cohen, Sharon Gannot, 
Multi-View Source Localization Based on Power Ratios.
TASLP2022
P. V. Muhammed Shifas, Catalin Zorila, Yannis Stylianou, 
End-to-End Neural Based Modification of Noisy Speech for Speech-in-Noise Intelligibility Improvement.
Interspeech2022
Tuomo Raitio, Petko Petkov, Jiangchuan Li, P. V. Muhammed Shifas, Andrea Davis, Yannis Stylianou, 
Vocal effort modeling in neural TTS for improving the intelligibility of synthetic speech in noise.
Interspeech2021
Dipjyoti Paul, Sankar Mukherjee, Yannis Pantazis, Yannis Stylianou, 
A Universal Multi-Speaker Multi-Style Text-to-Speech via Disentangled Representation Learning Based on Rényi Divergence Minimization.
Interspeech2020
Dipjyoti Paul, Yannis Pantazis, Yannis Stylianou, 
Speaker Conditional WaveRNN: Towards Universal Neural Vocoder for Unseen Speaker and Recording Conditions.
Interspeech2020
Dipjyoti Paul, P. V. Muhammed Shifas, Yannis Pantazis, Yannis Stylianou, 
Enhancing Speech Intelligibility in Text-To-Speech Synthesis Using Speaking Style Conversion.
ICASSP2019
Petko Nikolov Petkov, Vasileios Tsiaras, Rama Doddipatla, Yannis Stylianou, 
An Unsupervised Learning Approach to Neural-net-supported Wpe Dereverberation.
Interspeech2019
Nagaraj Adiga, Yannis Pantazis, Vassilis Tsiaras, Yannis Stylianou, 
Speech Enhancement for Noise-Robust Speech Synthesis Using Wasserstein GAN.
Interspeech2019
Dipjyoti Paul, Yannis Pantazis, Yannis Stylianou, 
Non-Parallel Voice Conversion Using Weighted Generative Adversarial Networks.
Interspeech2019
P. V. Muhammed Shifas, Nagaraj Adiga, Vassilis Tsiaras, Yannis Stylianou, 
A Non-Causal FFTNet Architecture for Speech Enhancement.
ICASSP2018
Alexandros Papangelis, Margarita Kotti, Yannis Stylianou, 
Towards Scalable Information-Seeking Multi-Domain Dialogue.
ICASSP2018
Jonathan Parker, Yannis Stylianou, Roberto Cipolla, 
Adaptation of an Expressive Single Speaker Deep Neural Network Speech Synthesis System.
Interspeech2018
Cong-Thanh Do, Yannis Stylianou, 
Weighting Time-Frequency Representation of Speech Using Auditory Saliency for Automatic Speech Recognition.
Interspeech2018
Margarita Kotti, Vassilios Diakoloukas, Alexandros Papangelis, Michail Lagoudakis, Yannis Stylianou, 
A Case Study on the Importance of Belief State Representation for Dialogue Policy Management.
Interspeech2018
P. V. Muhammed Shifas, Vassilis Tsiaras, Yannis Stylianou, 
Speech Intelligibility Enhancement Based on a Non-causal Wavenet-like Model.
ICASSP2017
Margarita Kotti, Yannis Stylianou, 
Effective emotion recognition in movie audio tracks.
ICASSP2017
Alexandros Papangelis, Margarita Kotti, Yannis Stylianou, 
Predicting dialogue success, naturalness, and length with acoustic features.
ICASSP2017
Jonathan Parker, Ranniery Maia, Yannis Stylianou, Roberto Cipolla, 
Expressive visual text to speech and expression adaptation using deep neural networks.
ICASSP2017
Petko Nikolov Petkov, Yannis Stylianou, 
Adaptive gain control and time warp for enhanced speech intelligibility under reverberation.
Interspeech2017
Cong-Thanh Do, Yannis Stylianou, 
Improved Automatic Speech Recognition Using Subband Temporal Envelope Features and Time-Delay Neural Network Denoising Autoencoder.
Interspeech2017
Tudor-Catalin Zorila, Yannis Stylianou, 
On the Quality and Intelligibility of Noisy Speech Processed for Near-End Listening Enhancement.
ICASSP2022
Takuhiro Kaneko, Kou Tanaka, Hirokazu Kameoka, Shogo Seki, 
ISTFTNET: Fast and Lightweight Mel-Spectrogram Vocoder Incorporating Inverse Short-Time Fourier Transform.
ICASSP2022
Li Li 0063, Hirokazu Kameoka, Shogo Seki, 
HBP: An Efficient Block Permutation Solver Using Hungarian Algorithm and Spectrogram Inpainting for Multichannel Audio Source Separation.
Interspeech2022
Hirokazu Kameoka, Takuhiro Kaneko, Shogo Seki, Kou Tanaka, 
CAUSE: Crossmodal Action Unit Sequence Estimation from Speech.
Interspeech2022
Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Shogo Seki, 
MISRNet: Lightweight Neural Vocoder Using Multi-Input Single Shared Residual Blocks.
TASLP2021
Wen-Chin Huang, Tomoki Hayashi, Yi-Chiao Wu, Hirokazu Kameoka, Tomoki Toda, 
Pretraining Techniques for Sequence-to-Sequence Voice Conversion.
ICASSP2021
Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Nobukatsu Hojo, 
Maskcyclegan-VC: Learning Non-Parallel Voice Conversion with Filling in Frames.
Interspeech2021
Shoki Sakamoto, Akira Taniguchi, Tadahiro Taniguchi, Hirokazu Kameoka, 
StarGAN-VC+ASR: StarGAN-Based Non-Parallel Voice Conversion Regularized by Automatic Speech Recognition.
TASLP2020
Hirokazu Kameoka, Takuhiro Kaneko, Kou Tanaka, Nobukatsu Hojo, 
Nonparallel Voice Conversion With Augmented Classifier Star Generative Adversarial Networks.
TASLP2020
Hirokazu Kameoka, Kou Tanaka, Damian Kwasny, Takuhiro Kaneko, Nobukatsu Hojo, 
ConvS2S-VC: Fully Convolutional Sequence-to-Sequence Voice Conversion.
Interspeech2020
Wen-Chin Huang, Tomoki Hayashi, Yi-Chiao Wu, Hirokazu Kameoka, Tomoki Toda, 
Voice Transformer Network: Sequence-to-Sequence Voice Conversion Using Transformer with Text-to-Speech Pretraining.
Interspeech2020
Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Nobukatsu Hojo, 
CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-Spectrogram Conversion.
TASLP2019
Hirokazu Kameoka, Takuhiro Kaneko, Kou Tanaka, Nobukatsu Hojo, 
ACVAE-VC: Non-Parallel Voice Conversion With Auxiliary Classifier Variational Autoencoder.
ICASSP2019
Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Nobukatsu Hojo, 
Cyclegan-VC2: Improved Cyclegan-based Non-parallel Voice Conversion.
ICASSP2019
Kou Tanaka, Hirokazu Kameoka, Takuhiro Kaneko, Nobukatsu Hojo, 
ATTS2S-VC: Sequence-to-sequence Voice Conversion with Attention and Context Preservation Mechanisms.
Interspeech2019
Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Nobukatsu Hojo, 
StarGAN-VC2: Rethinking Conditional Methods for StarGAN-Based Voice Conversion.
Interspeech2019
Dongxiao Wang, Hirokazu Kameoka, Koichi Shinoda, 
A Modified Algorithm for Multiple Input Spectrogram Inversion.
ICASSP2018
Kou Tanaka, Hirokazu Kameoka, Kazuho Morikawa, 
Vae-Space: Deep Generative Model of Voice Fundamental Frequency Contours.
ICASSP2017
Hirokazu Kameoka, Hideaki Kagami, Masahiro Yukawa, 
Complex NMF with the generalized Kullback-Leibler divergence.
ICASSP2017
Ryotaro Sato, Hirokazu Kameoka, Kunio Kashino, 
Fast algorithm for statistical phrase/accent command estimation based on generative model incorporating spectral features.
ICASSP2017
Yusuke Tajiri, Hirokazu Kameoka, Tomoki Toda, 
A noise suppression method for body-conducted soft speech based on non-negative tensor factorization of air- and body-conducted signals.
ICASSP2022
Anton Ratnarajah, Shi-Xiong Zhang, Meng Yu 0003, Zhenyu Tang 0001, Dinesh Manocha, Dong Yu 0001, 
Fast-Rir: Fast Neural Diffuse Room Impulse Response Generator.
ICASSP2022
Brian Yan, Chunlei Zhang, Meng Yu 0003, Shi-Xiong Zhang, Siddharth Dalmia, Dan Berrebbi, Chao Weng, Shinji Watanabe 0001, Dong Yu 0001, 
Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization.
ICASSP2022
Chunlei Zhang, Jiatong Shi, Chao Weng, Meng Yu 0003, Dong Yu 0001, 
Towards end-to-end Speaker Diarization with Generalized Neural Speaker Clustering.
Interspeech2022
Vinay Kothapally, Yong Xu 0004, Meng Yu 0003, Shi-Xiong Zhang, Dong Yu 0001, 
Joint Neural AEC and Beamforming with Double-Talk Detection.
TASLP2021
Daniel Michelsanti, Zheng-Hua Tan, Shi-Xiong Zhang, Yong Xu 0004, Meng Yu 0003, Dong Yu 0001, Jesper Jensen 0001, 
An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation.
TASLP2021
Zhuohuang Zhang, Yong Xu 0004, Meng Yu 0003, Shi-Xiong Zhang, Lianwu Chen, Donald S. Williamson, Dong Yu 0001, 
Multi-Channel Multi-Frame ADL-MVDR for Target Speech Separation.
ICASSP2021
Jiatong Shi, Chunlei Zhang, Chao Weng, Shinji Watanabe 0001, Meng Yu 0003, Dong Yu 0001, 
Improving RNN Transducer with Target Speaker Extraction and Neural Uncertainty Estimation.
ICASSP2021
Aswin Shanmugam Subramanian, Chao Weng, Shinji Watanabe 0001, Meng Yu 0003, Yong Xu 0004, Shi-Xiong Zhang, Dong Yu 0001, 
Directional ASR: A New Paradigm for E2E Multi-Speaker Speech Recognition with Source Localization.
ICASSP2021
Wei Xia, Chunlei Zhang, Chao Weng, Meng Yu 0003, Dong Yu 0001, 
Self-Supervised Text-Independent Speaker Verification Using Prototypical Momentum Contrastive Learning.
ICASSP2021
Chunlei Zhang, Meng Yu 0003, Chao Weng, Dong Yu 0001, 
Towards Robust Speaker Verification with Target Speaker Enhancement.
ICASSP2021
Naijun Zheng, Na Li 0012, Bo Wu, Meng Yu 0003, Jianwei Yu, Chao Weng, Dan Su 0002, Xunying Liu, Helen Meng, 
A Joint Training Framework of Multi-Look Separator and Speaker Embedding Extractor for Overlapped Speech.
Interspeech2021
Xiyun Li, Yong Xu 0004, Meng Yu 0003, Shi-Xiong Zhang, Jiaming Xu 0001, Bo Xu 0002, Dong Yu 0001, 
MIMO Self-Attentive RNN Beamformer for Multi-Speaker Speech Separation.
Interspeech2021
Helin Wang, Bo Wu, Lianwu Chen, Meng Yu 0003, Jianwei Yu, Yong Xu 0004, Shi-Xiong Zhang, Chao Weng, Dan Su 0002, Dong Yu 0001, 
TeCANet: Temporal-Contextual Attention Network for Environment-Aware Speech Dereverberation.
Interspeech2021
Yong Xu 0004, Zhuohuang Zhang, Meng Yu 0003, Shi-Xiong Zhang, Dong Yu 0001, 
Generalized Spatio-Temporal RNN Beamformer for Target Speech Separation.
Interspeech2021
Meng Yu 0003, Chunlei Zhang, Yong Xu 0004, Shi-Xiong Zhang, Dong Yu 0001, 
MetricNet: Towards Improved Modeling For Non-Intrusive Speech Quality Assessment.
ICASSP2020
Rongzhi Gu, Shi-Xiong Zhang, Lianwu Chen, Yong Xu 0004, Meng Yu 0003, Dan Su 0002, Yuexian Zou, Dong Yu 0001, 
Enhancing End-to-End Multi-Channel Speech Separation Via Spatial Feature Learning.
ICASSP2020
Aswin Shanmugam Subramanian, Chao Weng, Meng Yu 0003, Shi-Xiong Zhang, Yong Xu 0004, Shinji Watanabe 0001, Dong Yu 0001, 
Far-Field Location Guided Target Speech Extraction Using End-to-End Speech Recognition Objectives.
Interspeech2020
Meng Yu 0003, Xuan Ji, Bo Wu, Dan Su 0002, Dong Yu 0001, 
End-to-End Multi-Look Keyword Spotting.
Interspeech2020
Yong Xu 0004, Meng Yu 0003, Shi-Xiong Zhang, Lianwu Chen, Chao Weng, Jianming Liu, Dong Yu 0001, 
Neural Spatio-Temporal Beamformer for Target Speech Separation.
Interspeech2020
Chengzhu Yu, Heng Lu 0004, Na Hu, Meng Yu 0003, Chao Weng, Kun Xu 0005, Peng Liu, Deyi Tuo, Shiyin Kang, Guangzhi Lei, Dan Su 0002, Dong Yu 0001, 
DurIAN: Duration Informed Attention Network for Speech Synthesis.
ICASSP2022
Hua Shen, Yuguang Yang 0004, Guoli Sun, Ryan Langman, Eunjung Han, Jasha Droppo, Andreas Stolcke, 
Improving Fairness in Speaker Verification via Group-Adapted Fusion Network.
Interspeech2022
Minho Jin, Chelsea Ju, Zeya Chen, Yi-Chieh Liu, Jasha Droppo, Andreas Stolcke, 
Adversarial Reweighting for Speaker Verification Fairness.
Interspeech2022
Viet Anh Trinh, Pegah Ghahremani, Brian King, Jasha Droppo, Andreas Stolcke, Roland Maas, 
Reducing Geographic Disparities in Automatic Speech Recognition via Elastic Weight Consolidation.
KDD2022
Gopinath Chennupati, Milind Rao, Gurpreet Chadha, Aaron Eakin, Anirudh Raju, Gautam Tiwari, Anit Kumar Sahu, Ariya Rastrow, Jasha Droppo, Andy Oberlin, Buddha Nandanoor, Prahalad Venkataramanan, Zheng Wu, Pankaj Sitpure, 
ILASR: Privacy-Preserving Incremental Learning for Automatic Speech Recognition at Production Scale.
ICASSP2021
Yixin Chen 0003, Weiyi Lu, Alejandro Mottini, Li Erran Li, Jasha Droppo, Zheng Du, Belinda Zeng, 
Top-Down Attention in End-to-End Spoken Language Understanding.
ICASSP2021
Surabhi Punjabi, Harish Arsikere, Zeynab Raeesy, Chander Chandak, Nikhil Bhave, Ankish Bansal, Markus Müller, Sergio Murillo, Ariya Rastrow, Andreas Stolcke, Jasha Droppo, Sri Garimella, Roland Maas, Mat Hans, Athanasios Mouchtaris, Siegfried Kunzmann, 
Joint ASR and Language Identification Using RNN-T: An Efficient Approach to Dynamic Language Switching.
ICASSP2021
Milind Rao, Pranav Dheram, Gautam Tiwari, Anirudh Raju, Jasha Droppo, Ariya Rastrow, Andreas Stolcke, 
DO as I Mean, Not as I Say: Sequence Loss Training for Spoken Language Understanding.
ICASSP2021
Andrew Werchniak, Roberto Barra-Chicote, Yuriy Mishchenko, Jasha Droppo, Jeff Condal, Peng Liu, Anish Shah, 
Exploring the application of synthetic audio in training keyword spotters.
Interspeech2021
Jasha Droppo, Oguz Elibol, 
Scaling Laws for Acoustic Models.
Interspeech2021
Amin Fazel, Wei Yang, Yulan Liu, Roberto Barra-Chicote, Yixiong Meng, Roland Maas, Jasha Droppo, 
SynthASR: Unlocking Synthetic Data for Speech Recognition.
Interspeech2021
Daniel Korzekwa, Roberto Barra-Chicote, Szymon Zaporowski, Grzegorz Beringer, Jaime Lorenzo-Trueba, Alicja Serafinowicz, Jasha Droppo, Thomas Drugman, Bozena Kostek, 
Detection of Lexical Stress Errors in Non-Native (L2) English with Data Augmentation and Attention.
Interspeech2021
Jie Pu, Yuguang Yang 0004, Ruirui Li, Oguz Elibol, Jasha Droppo, 
Scaling Effect of Self-Supervised Speech Models.
Interspeech2021
Swayambhu Nath Ray, Minhua Wu, Anirudh Raju, Pegah Ghahremani, Raghavendra Bilgi, Milind Rao, Harish Arsikere, Ariya Rastrow, Andreas Stolcke, Jasha Droppo, 
Listen with Intent: Improving Speech Recognition with Audio-to-Intent Front-End.
Interspeech2021
Samik Sadhu, Di He, Che-Wei Huang, Sri Harish Mallidi, Minhua Wu, Ariya Rastrow, Andreas Stolcke, Jasha Droppo, Roland Maas, 
wav2vec-C: A Self-Supervised Model for Speech Representation Learning.
Interspeech2021
Muhammad A. Shah, Joseph Szurley, Markus Müller, Athanasios Mouchtaris, Jasha Droppo, 
Evaluating the Vulnerability of End-to-End Automatic Speech Recognition Models to Membership Inference Attacks.
Interspeech2021
Rupak Vignesh Swaminathan, Brian King, Grant P. Strimel, Jasha Droppo, Athanasios Mouchtaris, 
CoDERT: Distilling Encoder Representations with Co-Learning for Transducer-Based Speech Recognition.
Interspeech2021
Iván Vallés-Pérez, Julian Roth, Grzegorz Beringer, Roberto Barra-Chicote, Jasha Droppo, 
Improving Multi-Speaker TTS Prosody Variance with a Residual Encoder and Normalizing Flows.
Interspeech2020
Jinxi Guo, Gautam Tiwari, Jasha Droppo, Maarten Van Segbroeck, Che-Wei Huang, Andreas Stolcke, Roland Maas, 
Efficient Minimum Word Error Rate Training of RNN-Transducer for End-to-End Speech Recognition.
TASLP2018
Zhehuai Chen, Jasha Droppo, Jinyu Li 0001, Wayne Xiong, 
Progressive Joint Modeling in Unsupervised Single-Channel Overlapped Speech Recognition.
ICASSP2018
Zhehuai Chen, Jasha Droppo, 
Sequence Modeling in Unsupervised Single-Channel Overlapped Speech Recognition.
ICASSP2022
Madhu R. Kamble, Jose Patino 0001, Maria A. Zuluaga, Massimiliano Todisco, 
Exploring Auditory Acoustic Features for The Diagnosis of Covid-19.
ICASSP2022
Hemlata Tak, Madhu R. Kamble, Jose Patino 0001, Massimiliano Todisco, Nicholas W. D. Evans, 
Rawboost: A Raw Data Boosting and Augmentation Method Applied to Automatic Speaker Verification Anti-Spoofing.
ICASSP2021
Hemlata Tak, Jose Patino 0001, Massimiliano Todisco, Andreas Nautsch, Nicholas W. D. Evans, Anthony Larcher, 
End-to-End anti-spoofing with RawNet2.
Interspeech2021
Jose Patino 0001, Natalia A. Tomashenko, Massimiliano Todisco, Andreas Nautsch, Nicholas W. D. Evans, 
Speaker Anonymisation Using the McAdams Coefficient.
Interspeech2021
Oubaïda Chouchane, Baptiste Brossier, Jorge Esteban Gamboa Gamboa, Thomas Lardy, Hemlata Tak, Orhan Ermis, Madhu R. Kamble, Jose Patino 0001, Nicholas W. D. Evans, Melek Önen, Massimiliano Todisco, 
Privacy-Preserving Voice Anti-Spoofing Using Secure Multi-Party Computation.
Interspeech2021
Wanying Ge, Michele Panariello, Jose Patino 0001, Massimiliano Todisco, Nicholas W. D. Evans, 
Partially-Connected Differentiable Architecture Search for Deepfake and Spoofing Detection.
Interspeech2021
Madhu R. Kamble, José Andrés González López, Teresa Grau, Juan M. Espín, Lorenzo Cascioli, Yiqing Huang, Alejandro Gomez-Alanis, Jose Patino 0001, Roberto Font, Antonio M. Peinado, Angel M. Gomez, Nicholas W. D. Evans, Maria A. Zuluaga, Massimiliano Todisco, 
PANACEA Cough Sound-Based Diagnosis of COVID-19 for the DiCOVA 2021 Challenge.
Interspeech2021
Tomi Kinnunen, Andreas Nautsch, Md. Sahidullah, Nicholas W. D. Evans, Xin Wang 0037, Massimiliano Todisco, Héctor Delgado, Junichi Yamagishi, Kong Aik Lee, 
Visualizing Classifier Adjacency Relations: A Case Study in Speaker Verification and Voice Anti-Spoofing.
Interspeech2021
Hemlata Tak, Jee-weon Jung, Jose Patino 0001, Massimiliano Todisco, Nicholas W. D. Evans, 
Graph Attention Networks for Anti-Spoofing.
TASLP2020
Tomi Kinnunen, Héctor Delgado, Nicholas W. D. Evans, Kong Aik Lee, Ville Vestman, Andreas Nautsch, Massimiliano Todisco, Xin Wang 0037, Md. Sahidullah, Junichi Yamagishi, Douglas A. Reynolds, 
Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification: Fundamentals.
ICASSP2020
Pramod B. Bachhav, Massimiliano Todisco, Nicholas W. D. Evans, 
Artificial Bandwidth Extension Using Conditional Variational Auto-encoders and Adversarial Learning.
Interspeech2020
Andreas Nautsch, Jose Patino 0001, Natalia A. Tomashenko, Junichi Yamagishi, Paul-Gauthier Noé, Jean-François Bonastre, Massimiliano Todisco, Nicholas W. D. Evans, 
The Privacy ZEBRA: Zero Evidence Biometric Recognition Assessment.
Interspeech2020
Hemlata Tak, Jose Patino 0001, Andreas Nautsch, Nicholas W. D. Evans, Massimiliano Todisco, 
Spoofing Attack Detection Using the Non-Linear Fusion of Sub-Band Classifiers.
Interspeech2020
Natalia A. Tomashenko, Brij Mohan Lal Srivastava, Xin Wang 0037, Emmanuel Vincent 0001, Andreas Nautsch, Junichi Yamagishi, Nicholas W. D. Evans, Jose Patino 0001, Jean-François Bonastre, Paul-Gauthier Noé, Massimiliano Todisco, 
Introducing the VoicePrivacy Initiative.
ICASSP2019
Pramod B. Bachhav, Massimiliano Todisco, Nicholas W. D. Evans, 
Latent Representation Learning for Artificial Bandwidth Extension Using a Conditional Variational Auto-encoder.
Interspeech2019
Kong Aik Lee, Ville Hautamäki, Tomi H. Kinnunen, Hitoshi Yamamoto, Koji Okabe, Ville Vestman, Jing Huang 0019, Guohong Ding, Hanwu Sun, Anthony Larcher, Rohan Kumar Das, Haizhou Li 0001, Mickael Rouvier, Pierre-Michel Bousquet, Wei Rao, Qing Wang 0039, Chunlei Zhang, Fahimeh Bahmaninezhad, Héctor Delgado, Massimiliano Todisco, 
I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences.
Interspeech2019
Andreas Nautsch, Jose Patino 0001, Amos Treiber, Themos Stafylakis, Petr Mizera, Massimiliano Todisco, Thomas Schneider 0003, Nicholas W. D. Evans, 
Privacy-Preserving Speaker Recognition with Cohort Score Normalisation.
Interspeech2019
Andreas Nautsch, Catherine Jasserand, Els Kindt, Massimiliano Todisco, Isabel Trancoso, Nicholas W. D. Evans, 
The GDPR & Speech Data: Reflections of Legal and Technology Communities, First Steps Towards a Common Understanding.
Interspeech2019
Massimiliano Todisco, Xin Wang 0037, Ville Vestman, Md. Sahidullah, Héctor Delgado, Andreas Nautsch, Junichi Yamagishi, Nicholas W. D. Evans, Tomi H. Kinnunen, Kong Aik Lee, 
ASVspoof 2019: Future Horizons in Spoofed and Fake Audio Detection.
ICASSP2018
Pramod B. Bachhav, Massimiliano Todisco, Nicholas W. D. Evans, 
Efficient Super-Wide Bandwidth Extension Using Linear Prediction Based Analysis-Synthesis.
TASLP2022
Xiaoqiang Wang, Yanqing Liu, Jinyu Li 0001, Veljko Miljanic, Sheng Zhao, Hosam Khalil, 
Towards Contextual Spelling Correction for Customization of End-to-End Speech Recognition Systems.
ICASSP2022
Zehua Chen, Xu Tan 0003, Ke Wang, Shifeng Pan, Danilo P. Mandic, Lei He 0005, Sheng Zhao, 
Infergrad: Improving Diffusion Models for Vocoder by Considering Inference in Training.
ICASSP2022
Liyang Chen, Zhiyong Wu 0001, Jun Ling, Runnan Li, Xu Tan 0003, Sheng Zhao, 
Transformer-S2A: Robust and Efficient Speech-to-Animation.
ICASSP2022
Guangyan Zhang, Yichong Leng, Daxin Tan, Ying Qin, Kaitao Song, Xu Tan 0003, Sheng Zhao, Tan Lee, 
A Study on the Efficacy of Model Pre-Training In Developing Neural Text-to-Speech System.
Interspeech2022
Yanqing Liu, Ruiqing Xue, Lei He 0005, Xu Tan 0003, Sheng Zhao, 
DelightfulTTS 2: End-to-End Speech Synthesis with Adversarial Vector-Quantized Auto-Encoders.
Interspeech2022
Yihan Wu, Xu Tan 0003, Bohan Li 0003, Lei He 0005, Sheng Zhao, Ruihua Song, Tao Qin, Tie-Yan Liu, 
AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios.
Interspeech2022
Dacheng Yin, Chuanxin Tang, Yanqing Liu, Xiaoqiang Wang, Zhiyuan Zhao, Yucheng Zhao, Zhiwei Xiong, Sheng Zhao, Chong Luo, 
RetrieverTTS: Modeling Decomposed Factors for Text-Based Speech Insertion.
Interspeech2022
Guangyan Zhang, Kaitao Song, Xu Tan 0003, Daxin Tan, Yuzi Yan, Yanqing Liu, Gang Wang, Wei Zhou, Tao Qin, Tan Lee, Sheng Zhao, 
Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech.
NeurIPS2022
Yichong Leng, Zehua Chen, Junliang Guo, Haohe Liu, Jiawei Chen, Xu Tan, Danilo P. Mandic, Lei He, Xiangyang Li, Tao Qin, Sheng Zhao, Tie-Yan Liu, 
BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis.
ICASSP2021
Yichong Leng, Xu Tan 0003, Sheng Zhao, Frank K. Soong, Xiang-Yang Li 0001, Tao Qin, 
MBNET: MOS Prediction for Synthesized Speech with Mean-Bias Network.
ICASSP2021
Renqian Luo, Xu Tan 0003, Rui Wang, Tao Qin, Jinzhu Li, Sheng Zhao, Enhong Chen, Tie-Yan Liu, 
Lightspeech: Lightweight and Fast Text to Speech with Neural Architecture Search.
ICASSP2021
Yuzi Yan, Xu Tan 0003, Bohan Li 0003, Tao Qin, Sheng Zhao, Yuan Shen 0001, Tie-Yan Liu, 
Adaspeech 2: Adaptive Text to Speech with Untranscribed Data.
ICASSP2021
Chen Zhang 0020, Yi Ren 0006, Xu Tan 0003, Jinglin Liu, Kejun Zhang, Tao Qin, Sheng Zhao, Tie-Yan Liu, 
Denoispeech: Denoising Text to Speech with Frame-Level Noise Modeling.
Interspeech2021
Xiaoqiang Wang, Yanqing Liu, Sheng Zhao, Jinyu Li 0001, 
A Light-Weight Contextual Spelling Correction Model for Customizing Transducer-Based Speech Recognition Systems.
Interspeech2021
Yuzi Yan, Xu Tan 0003, Bohan Li 0003, Guangyan Zhang, Tao Qin, Sheng Zhao, Yuan Shen 0001, Wei-Qiang Zhang, Tie-Yan Liu, 
Adaptive Text to Speech for Spontaneous Style.
ICLR2021
Yi Ren 0006, Chenxu Hu, Xu Tan 0003, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu, 
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.
ICLR2021
Mingjian Chen, Xu Tan 0003, Bohan Li 0003, Yanqing Liu, Tao Qin, Sheng Zhao, Tie-Yan Liu, 
AdaSpeech: Adaptive Text to Speech for Custom Voice.
Interspeech2020
Chengyi Wang 0002, Yu Wu 0012, Yujiao Du, Jinyu Li 0001, Shujie Liu 0001, Liang Lu 0001, Shuo Ren, Guoli Ye, Sheng Zhao, Ming Zhou 0001, 
Semantic Mask for Transformer Based End-to-End Speech Recognition.
Interspeech2020
Mingjian Chen, Xu Tan 0003, Yi Ren 0006, Jin Xu 0010, Hao Sun, Sheng Zhao, Tao Qin, 
MultiSpeech: Multi-Speaker Text to Speech with Transformer.
Interspeech2020
Naihan Li, Shujie Liu 0001, Yanqing Liu, Sheng Zhao, Ming Liu, Ming Zhou 0001, 
MoBoAligner: A Neural Alignment Model for Non-Autoregressive TTS with Monotonic Boundary Search.
SpeechComm2021
Tharshini Gunendradasan, Eliathamby Ambikairajah, Julien Epps, Vidhyasaharan Sethu, Haizhou Li 0001, 
An adaptive transmission line cochlear model based front-end for replay attack detection.
Interspeech2021
Beena Ahmed, Kirrie J. Ballard, Denis Burnham, Tharmakulasingam Sirojan, Hadi Mehmood, Dominique Estival, Elise Baker, Felicity Cox, Joanne Arciuli, Titia Benders, Katherine Demuth, Barbara Kelly, Chloé Diskin-Holdaway, Mostafa Ali Shahin, Vidhyasaharan Sethu, Julien Epps, Chwee Beng Lee, Eliathamby Ambikairajah, 
AusKidTalk: An Auditory-Visual Corpus of 3- to 12-Year-Old Australian Children's Speech.
Interspeech2021
Deboshree Bose, Vidhyasaharan Sethu, Eliathamby Ambikairajah, 
Parametric Distributions to Model Numerical Emotion Labels.
ICASSP2020
Eliathamby Ambikairajah, Vidhyasaharan Sethu, 
Cochlear Signal Processing: A Platform for Learning the Fundamentals of Digital Signal Processing.
ICASSP2020
Gajan Suthokumar, Vidhyasaharan Sethu, Kaavya Sriskandaraja, Eliathamby Ambikairajah, 
Adversarial Multi-Task Learning for Speaker Normalization in Replay Detection.
ICASSP2019
Gajan Suthokumar, Kaavya Sriskandaraja, Vidhyasaharan Sethu, Chamith Wijenayake, Eliathamby Ambikairajah, 
Phoneme Specific Modelling and Scoring Techniques for Anti Spoofing System.
ICASSP2019
Buddhi Wickramasinghe, Eliathamby Ambikairajah, Julien Epps, Vidhyasaharan Sethu, Haizhou Li 0001, 
Auditory Inspired Spatial Differentiation for Replay Spoofing Attack Detection.
Interspeech2019
Anda Ouyang, Ting Dang, Vidhyasaharan Sethu, Eliathamby Ambikairajah, 
Speech Based Emotion Prediction: Can a Linear Model Work?
SpeechComm2018
Saad Irtza, Vidhyasaharan Sethu, Eliathamby Ambikairajah, Haizhou Li 0001, 
Using language cluster models in hierarchical language identification.
ICASSP2018
Sarith Fernando, Vidhyasaharan Sethu, Eliathamby Ambikairajah, 
Factorized Hidden Variability Learning for Adaptation of Short Duration Language Identification Models.
ICASSP2018
Jianbo Ma, Vidhyasaharan Sethu, Eliathamby Ambikairajah, Kong-Aik Lee, 
Speaker-Phonetic Vector Estimation for Short Duration Speaker Verification.
Interspeech2018
Mia Atcheson, Vidhyasaharan Sethu, Julien Epps, 
Demonstrating and Modelling Systematic Time-varying Annotator Disagreement in Continuous Emotion Annotation.
Interspeech2018
Sarith Fernando, Vidhyasaharan Sethu, Eliathamby Ambikairajah, 
Sub-band Envelope Features Using Frequency Domain Linear Prediction for Short Duration Language Identification.
Interspeech2018
Kaavya Sriskandaraja, Vidhyasaharan Sethu, Eliathamby Ambikairajah, 
Deep Siamese Architecture Based Replay Detection for Secure Voice Biometric.
Interspeech2018
Gajan Suthokumar, Vidhyasaharan Sethu, Chamith Wijenayake, Eliathamby Ambikairajah, 
Modulation Dynamic Features for the Detection of Replay Attacks.
Interspeech2017
Ting Dang, Vidhyasaharan Sethu, Julien Epps, Eliathamby Ambikairajah, 
An Investigation of Emotion Prediction Uncertainty Using Gaussian Mixture Regression.
Interspeech2017
Sarith Fernando, Vidhyasaharan Sethu, Eliathamby Ambikairajah, Julien Epps, 
Bidirectional Modelling for Short Duration Language Identification.
Interspeech2017
Saad Irtza, Vidhyasaharan Sethu, Eliathamby Ambikairajah, Haizhou Li 0001, 
Investigating Scalability in Hierarchical Language Identification System.
Interspeech2017
Kong-Aik Lee, Ville Hautamäki, Tomi Kinnunen, Anthony Larcher, Chunlei Zhang, Andreas Nautsch, Themos Stafylakis, Gang Liu 0001, Mickaël Rouvier, Wei Rao, Federico Alegre, J. Ma, Man-Wai Mak, Achintya Kumar Sarkar, Héctor Delgado, Rahim Saeidi, Hagai Aronowitz, Aleksandr Sizov, Hanwu Sun, Trung Hieu Nguyen 0001, G. Wang, Bin Ma 0001, Ville Vestman, Md. Sahidullah, M. Halonen, Anssi Kanervisto, Gaël Le Lan, Fahimeh Bahmaninezhad, Sergey Isadskiy, Christian Rathgeb, Christoph Busch 0001, Georgios Tzimiropoulos, Q. Qian, Z. Wang, Q. Zhao, T. Wang, H. Li, J. Xue, S. Zhu, R. Jin, T. Zhao, Pierre-Michel Bousquet, Moez Ajili, Waad Ben Kheder, Driss Matrouf, Zhi Hao Lim, Chenglin Xu, Haihua Xu, Xiong Xiao, Eng Siong Chng, Benoit G. B. Fauve, Kaavya Sriskandaraja, Vidhyasaharan Sethu, W. W. Lin, Dennis Alexander Lehmann Thomsen, Zheng-Hua Tan, Massimiliano Todisco, Nicholas W. D. Evans, Haizhou Li 0001, John H. L. Hansen, Jean-François Bonastre, Eliathamby Ambikairajah, 
The I4U Mega Fusion and Collaboration for NIST Speaker Recognition Evaluation 2016.
Interspeech2017
Jianbo Ma, Vidhyasaharan Sethu, Eliathamby Ambikairajah, Kong-Aik Lee, 
Incorporating Local Acoustic Variability Information into Short Duration Speaker Verification.
ICASSP2022
Zohreh Mostaani, RaviShankar Prasad, Bogdan Vlasenko, Mathew Magimai-Doss, 
Modeling of Pre-Trained Neural Network Embeddings Learned From Raw Waveform for COVID-19 Infection Detection.
Interspeech2022
Zohreh Mostaani, Mathew Magimai-Doss, 
On Breathing Pattern Information in Synthetic Speech.
Interspeech2022
Eklavya Sarkar, RaviShankar Prasad, Mathew Magimai-Doss, 
Unsupervised Voice Activity Detection by Modeling Source and System Information using Zero Frequency Filtering.
ICASSP2021
Zohreh Mostaani, Venkata Srikanth Nallanthighal, Aki Härmä, Helmer Strik, Mathew Magimai-Doss, 
On The Relationship Between Speech-Based Breathing Signal Prediction Evaluation Measures and Breathing Parameters Estimation.
Interspeech2021
Enno Hermann, Mathew Magimai-Doss, 
Handling Acoustic Variation in Dysarthric Speech Recognition Systems Through Model Combination.
Interspeech2021
RaviShankar Prasad, Mathew Magimai-Doss, 
Identification of F1 and F2 in Speech Using Modified Zero Frequency Filtering.
Interspeech2021
Juan Camilo Vásquez-Correa, Julian Fritsch, Juan Rafael Orozco-Arroyave, Elmar Nöth, Mathew Magimai-Doss, 
On Modeling Glottal Source Information for Phonation Assessment in Parkinson's Disease.
Interspeech2021
Esaú Villatoro-Tello, S. Pavankumar Dubagunta, Julian Fritsch, Gabriela Ramírez-de-la-Rosa, Petr Motlícek, Mathew Magimai-Doss, 
Late Fusion of the Available Lexicon and Raw Waveform-Based Acoustic Modeling for Depression and Dementia Recognition.
ICASSP2020
Enno Hermann, Mathew Magimai-Doss, 
Dysarthric Speech Recognition with Lattice-Free MMI.
ICASSP2020
RaviShankar Prasad, Gürkan Yilmaz, Olivier Chételat, Mathew Magimai-Doss, 
Detection Of S1 And S2 Locations In Phonocardiogram Signals Using Zero Frequency Filter.
ICASSP2020
Sandrine Tornay, Marzieh Razavi, Mathew Magimai-Doss, 
Towards Multilingual Sign Language Recognition.
Interspeech2020
Nicholas Cummins, Yilin Pan, Zhao Ren, Julian Fritsch, Venkata Srikanth Nallanthighal, Heidi Christensen, Daniel Blackburn, Björn W. Schuller, Mathew Magimai-Doss, Helmer Strik, Aki Härmä, 
A Comparison of Acoustic and Linguistics Methodologies for Alzheimer's Dementia Recognition.
SpeechComm2019
Dimitri Palaz, Mathew Magimai-Doss, Ronan Collobert, 
End-to-end acoustic modeling using convolutional neural networks for HMM-based automatic speech recognition.
ICASSP2019
S. Pavankumar Dubagunta, Selen Hande Kabil, Mathew Magimai-Doss, 
Improving Children Speech Recognition through Feature Learning from Raw Speech Signal.
ICASSP2019
S. Pavankumar Dubagunta, Mathew Magimai-Doss, 
Segment-level Training of ANNs Based on Acoustic Confidence Measures for Hybrid HMM/ANN Speech Recognition.
ICASSP2019
S. Pavankumar Dubagunta, Bogdan Vlasenko, Mathew Magimai-Doss, 
Learning Voice Source Related Information for Depression Detection.
ICASSP2019
Sandrine Tornay, Marzieh Razavi, Necati Cihan Camgöz, Richard Bowden, Mathew Magimai-Doss, 
HMM-based Approaches to Model Multichannel Information in Sign Language Inspired from Articulatory Features-based Speech Processing.
Interspeech2019
S. Pavankumar Dubagunta, Mathew Magimai-Doss, 
Using Speech Production Knowledge for Raw Waveform Modelling Based Styrian Dialect Identification.
Interspeech2019
Hannah Muckenhirn, Vinayak Abrol, Mathew Magimai-Doss, Sébastien Marcel, 
Understanding and Visualizing Raw Waveform-Based CNNs.
SpeechComm2018
Marzieh Razavi, Ramya Rasipuram, Mathew Magimai-Doss, 
Towards weakly supervised acoustic subword unit discovery and lexicon development using hidden Markov models.
ICASSP2022
Benjamin van Niekerk, Marc-André Carbonneau, Julian Zaïdi, Matthew Baas, Hugo Seuté, Herman Kamper, 
A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion.
Interspeech2022
Matthew Baas, Herman Kamper, 
Voice Conversion Can Improve ASR in Very Low-Resource Settings.
Interspeech2022
Werner van der Merwe, Herman Kamper, Johan Adam du Preez, 
A Temporal Extension of Latent Dirichlet Allocation for Unsupervised Acoustic Unit Discovery.
TASLP2021
Herman Kamper, Yevgen Matusevych, Sharon Goldwater, 
Improved Acoustic Word Embeddings for Zero-Resource Languages Using Multilingual Transfer.
Interspeech2021
Christiaan Jacobs, Herman Kamper, 
Multilingual Transfer of Acoustic Word Embeddings Improves When Training on Languages Related to the Target Zero-Resource Language.
Interspeech2021
Herman Kamper, Benjamin van Niekerk, 
Towards Unsupervised Phone and Word Segmentation Using Self-Supervised Vector-Quantized Neural Networks.
Interspeech2021
Benjamin van Niekerk, Leanne Nortje, Matthew Baas, Herman Kamper, 
Analyzing Speaker Information in Self-Supervised Models to Improve Zero-Resource Speech Processing.
Interspeech2021
Leanne Nortje, Herman Kamper, 
Direct Multimodal Few-Shot Learning of Speech and Images.
Interspeech2021
Kayode Olaleye, Herman Kamper, 
Attention-Based Keyword Localisation in Speech Using Visual Grounding.
ICASSP2020
Sameer Bansal, Herman Kamper, Adam Lopez, Sharon Goldwater, 
Cross-Lingual Topic Prediction For Speech Using Translations.
ICASSP2020
Herman Kamper, Yevgen Matusevych, Sharon Goldwater, 
Multilingual Acoustic Word Embedding Models for Processing Zero-resource Languages.
Interspeech2020
Benjamin van Niekerk, Leanne Nortje, Herman Kamper, 
Vector-Quantized Neural Networks for Acoustic Unit Discovery in the ZeroSpeech 2020 Challenge.
Interspeech2020
Leanne Nortje, Herman Kamper, 
Unsupervised vs. Transfer Learning for Multimodal One-Shot Matching of Speech and Images.
TASLP2019
Herman Kamper, Gregory Shakhnarovich, Karen Livescu, 
Semantic Speech Retrieval With a Visually Grounded Model of Untranscribed Speech.
ICASSP2019
Ryan Eloff, Herman A. Engelbrecht, Herman Kamper, 
Multimodal One-shot Learning of Speech and Images.
ICASSP2019
Herman Kamper, 
Truly Unsupervised Acoustic Word Embeddings Using Weak Top-down Constraints in Encoder-decoder Models.
ICASSP2019
Herman Kamper, Aristotelis Anastassiou, Karen Livescu, 
Semantic Query-by-example Speech Search Using Visual Grounding.
Interspeech2019
Ryan Eloff, André Nortje, Benjamin van Niekerk, Avashna Govender, Leanne Nortje, Arnu Pretorius, Elan Van Biljon, Ewald van der Westhuizen, Lisa van Staden, Herman Kamper, 
Unsupervised Acoustic Unit Discovery for Speech Synthesis Using Discrete Latent-Variable Neural Networks.
Interspeech2019
Raghav Menon, Herman Kamper, Ewald van der Westhuizen, John A. Quinn, Thomas Niesler, 
Feature Exploration for Almost Zero-Resource ASR-Free Keyword Spotting Using a Multilingual Bottleneck Extractor and Correspondence Autoencoders.
Interspeech2019
Ankita Pasad, Bowen Shi, Herman Kamper, Karen Livescu, 
On the Contributions of Visual and Textual Supervision in Low-Resource Semantic Speech Retrieval.
Interspeech2022
Xiaodong Cui, George Saon, Tohru Nagano, Masayuki Suzuki, Takashi Fukuda, Brian Kingsbury, Gakuto Kurata, 
Improving Generalization of Deep Neural Network Acoustic Models with Length Perturbation and N-best Based Label Smoothing.
Interspeech2022
Takashi Fukuda, Samuel Thomas 0001, Masayuki Suzuki, Gakuto Kurata, George Saon, Brian Kingsbury, 
Global RNN Transducer Models For Multi-dialect Speech Recognition.
Interspeech2022
Sashi Novitasari, Takashi Fukuda, Gakuto Kurata, 
Improving ASR Robustness in Noisy Condition Through VAD Integration.
Interspeech2022
Takuma Udagawa, Masayuki Suzuki, Gakuto Kurata, Nobuyasu Itoh, George Saon, 
Effect and Analysis of Large-scale Language Model Rescoring on Competitive ASR Systems.
ICASSP2021
Samuel Thomas 0001, Hong-Kwang Jeff Kuo, George Saon, Zoltán Tüske, Brian Kingsbury, Gakuto Kurata, Zvi Kons, Ron Hoory, 
RNN Transducer Models for Spoken Language Understanding.
Interspeech2021
Gakuto Kurata, George Saon, Brian Kingsbury, David Haws, Zoltán Tüske, 
Improving Customization of Neural Transducers by Mitigating Acoustic Mismatch of Synthesized Audio.
ICASSP2020
Shintaro Ando, Masayuki Suzuki, Nobuyasu Itoh, Gakuto Kurata, Nobuaki Minematsu, 
Converting Written Language to Spoken Language with Neural Machine Translation for Language Modeling.
ICASSP2020
Yosuke Higuchi, Masayuki Suzuki, Gakuto Kurata, 
Speaker Embeddings Incorporating Acoustic Conditions for Diarization.
Interspeech2020
Hagai Aronowitz, Weizhong Zhu, Masayuki Suzuki, Gakuto Kurata, Ron Hoory, 
New Advances in Speaker Diarization.
Interspeech2020
Hong-Kwang Jeff Kuo, Zoltán Tüske, Samuel Thomas 0001, Yinghui Huang, Kartik Audhkhasi, Brian Kingsbury, Gakuto Kurata, Zvi Kons, Ron Hoory, Luis A. Lastras, 
End-to-End Spoken Language Understanding Without Full Transcripts.
Interspeech2020
Gakuto Kurata, George Saon, 
Knowledge Distillation from Offline to Streaming RNN Transducer for End-to-End Speech Recognition.
ICASSP2019
Masayuki Suzuki, Nobuyasu Itoh, Tohru Nagano, Gakuto Kurata, Samuel Thomas 0001, 
Improvements to N-gram Language Model Using Text Generated from Neural Language Model.
ICASSP2019
Samuel Thomas 0001, Masayuki Suzuki, Yinghui Huang, Gakuto Kurata, Zoltán Tüske, George Saon, Brian Kingsbury, Michael Picheny, Tom Dibert, Alice Kaiser-Schatzlein, Bern Samko, 
English Broadcast News Speech Recognition by Humans and Machines.
Interspeech2019
Takashi Fukuda, Masayuki Suzuki, Gakuto Kurata, 
Direct Neuron-Wise Fusion of Cognate Neural Networks.
Interspeech2019
Gakuto Kurata, Kartik Audhkhasi, 
Guiding CTC Posterior Spike Timings for Improved Posterior Fusion and Knowledge Distillation.
Interspeech2019
Gakuto Kurata, Kartik Audhkhasi, 
Multi-Task CTC Training with Auxiliary Feature Reconstruction for End-to-End Speech Recognition.
Interspeech2018
Takashi Fukuda, Raul Fernandez, Andrew Rosenberg, Samuel Thomas 0001, Bhuvana Ramabhadran, Alexander Sorin, Gakuto Kurata, 
Data Augmentation Improves Recognition of Foreign Accented Speech.
Interspeech2018
Masayuki Suzuki, Tohru Nagano, Gakuto Kurata, Samuel Thomas 0001, 
Inference-Invariant Transformation of Batch Normalization for Domain Adaptation of Acoustic Models.
ICASSP2017
Takashi Fukuda, Osamu Ichikawa, Gakuto Kurata, Ryuki Tachibana, Samuel Thomas 0001, Bhuvana Ramabhadran, 
Effective joint training of denoising feature space transforms and Neural Network based acoustic models.
ICASSP2017
Osamu Ichikawa, Takashi Fukuda, Masayuki Suzuki, Gakuto Kurata, Bhuvana Ramabhadran, 
Harmonic feature fusion for robust neural network-based acoustic modeling.
SpeechComm2023
Paula Andrea Pérez-Toro, Tomás Arias-Vergara, Philipp Klumpp, Juan Camilo Vásquez-Correa, Maria Schuster, Elmar Nöth, Juan Rafael Orozco-Arroyave, 
Depression assessment in people with Parkinson's disease: The combination of acoustic features and natural language processing.
Interspeech2022
Abner Hernandez, Paula Andrea Pérez-Toro, Elmar Nöth, Juan Rafael Orozco-Arroyave, Andreas K. Maier, Seung Hee Yang, 
Cross-lingual Self-Supervised Speech Representations for Improved Dysarthric Speech Recognition.
Interspeech2022
Paula Andrea Pérez-Toro, Philipp Klumpp, Abner Hernandez, Tomas Arias, Patricia Lillo, Andrea Slachevsky, Adolfo Martín García, Maria Schuster, Andreas K. Maier, Elmar Nöth, Juan Rafael Orozco-Arroyave, 
Alzheimer's Detection from English to Spanish Using Acoustic and Linguistic Embeddings.
Interspeech2022
P. Schäfer, Paula Andrea Pérez-Toro, Philipp Klumpp, Juan Rafael Orozco-Arroyave, Elmar Nöth, Andreas K. Maier, A. Abad, Maria Schuster, Tomás Arias-Vergara, 
CoachLea: an Android Application to Evaluate the Speech Production and Perception of Children with Hearing Loss.
ICASSP2021
Paula Andrea Pérez-Toro, Juan Camilo Vásquez-Correa, Tomas Arias-Vergara, Philipp Klumpp, M. Sierra-Castrillón, M. E. Roldán-López, D. Aguillón, L. Hincapié-Henao, Carlos Andrés Tobón-Quintero, Tobias Bocklet, Maria Schuster, Juan Rafael Orozco-Arroyave, Elmar Nöth, 
Acoustic and Linguistic Analyses to Assess Early-Onset and Genetic Alzheimer's Disease.
ICASSP2021
Juan Camilo Vásquez-Correa, Tomas Arias-Vergara, Philipp Klumpp, Paula Andrea Pérez-Toro, Juan Rafael Orozco-Arroyave, Elmar Nöth, 
End-2-End Modeling of Speech and Gait from Patients with Parkinson's Disease: Comparison Between High Quality Vs. Smartphone Data.
Interspeech2021
Philipp Klumpp, Tobias Bocklet, Tomas Arias-Vergara, Juan Camilo Vásquez-Correa, Paula Andrea Pérez-Toro, Sebastian P. Bayerl, Juan Rafael Orozco-Arroyave, Elmar Nöth, 
The Phonetic Footprint of Covid-19?
Interspeech2021
Paula Andrea Pérez-Toro, Sebastian P. Bayerl, Tomas Arias-Vergara, Juan Camilo Vásquez-Correa, Philipp Klumpp, Maria Schuster, Elmar Nöth, Juan Rafael Orozco-Arroyave, Korbinian Riedhammer, 
Influence of the Interviewer on the Automatic Assessment of Alzheimer's Disease in the Context of the ADReSSo Challenge.
Interspeech2021
Juan Camilo Vásquez-Correa, Julian Fritsch, Juan Rafael Orozco-Arroyave, Elmar Nöth, Mathew Magimai-Doss, 
On Modeling Glottal Source Information for Phonation Assessment in Parkinson's Disease.
SpeechComm2020
Juan Camilo Vásquez-Correa, Tomas Arias-Vergara, Maria Schuster, Juan Rafael Orozco-Arroyave, Elmar Nöth, 
Parallel Representation Learning for the Classification of Pathological Speech: Studies on Parkinson's Disease and Cleft Lip and Palate.
ICASSP2020
Juan Camilo Vásquez-Correa, Tobias Bocklet, Juan Rafael Orozco-Arroyave, Elmar Nöth, 
Comparison of User Models Based on GMM-UBM and I-Vectors for Speech, Handwriting, and Gait Assessment of Parkinson's Disease Patients.
Interspeech2020
Philipp Klumpp, Tomás Arias-Vergara, Juan Camilo Vásquez-Correa, Paula Andrea Pérez-Toro, Florian Hönig, Elmar Nöth, Juan Rafael Orozco-Arroyave, 
Surgical Mask Detection with Deep Recurrent Phonetic Models.
Interspeech2019
Tomas Arias-Vergara, Juan Rafael Orozco-Arroyave, Milos Cernak, Sandra Gollwitzer, Maria Schuster, Elmar Nöth, 
Phone-Attribute Posteriors to Evaluate the Speech of Cochlear Implant Users.
Interspeech2019
José Vicente Egas López, Juan Rafael Orozco-Arroyave, Gábor Gosztolya, 
Assessing Parkinson's Disease from Speech Using Fisher Vectors.
Interspeech2019
Alice Rueda, Juan Camilo Vásquez-Correa, Cristian David Rios-Urrego, Juan Rafael Orozco-Arroyave, Sridhar Krishnan 0001, Elmar Nöth, 
Feature Representation of Pathophysiology of Parkinsonian Dysarthria.
Interspeech2019
Juan Camilo Vásquez-Correa, Tomas Arias-Vergara, Philipp Klumpp, M. Strauss, Arne Küderle, Nils Roth, S. Bayerl, Nicanor García-Ospina, Paula Andrea Pérez-Toro, L. Felipe Parra-Gallego, Cristian David Rios-Urrego, Daniel Escobar-Grisales, Juan Rafael Orozco-Arroyave, Björn M. Eskofier, Elmar Nöth, 
Apkinson: A Mobile Solution for Multimodal Assessment of Patients with Parkinson's Disease.
Interspeech2019
Juan Camilo Vásquez-Correa, Philipp Klumpp, Juan Rafael Orozco-Arroyave, Elmar Nöth, 
Phonet: A Tool Based on Gated Recurrent Neural Networks to Extract Phonological Posteriors from Speech.
SpeechComm2018
Tomas Arias-Vergara, Juan Camilo Vásquez-Correa, Juan Rafael Orozco-Arroyave, Elmar Nöth, 
Speaker models for monitoring Parkinson's disease progression considering different communication channels and acoustic conditions.
ICASSP2018
Tomas Arias-Vergara, Juan Camilo Vásquez-Correa, Juan Rafael Orozco-Arroyave, Philipp Klumpp, Elmar Nöth, 
Unobtrusive Monitoring of Speech Impairments of Parkinson'S Disease Patients Through Mobile Devices.
Interspeech2018
Nicanor García, Juan Camilo Vásquez-Correa, Juan Rafael Orozco-Arroyave, Elmar Nöth, 
Multimodal I-vectors to Detect and Evaluate Parkinson's Disease.
ICASSP2022
Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda, 
An Investigation of Streaming Non-Autoregressive sequence-to-sequence Voice Conversion.
ICASSP2022
Wen-Chin Huang, Shu-Wen Yang, Tomoki Hayashi, Hung-Yi Lee, Shinji Watanabe 0001, Tomoki Toda, 
S3PRL-VC: Open-Source Voice Conversion Framework with Self-Supervised Speech Representations.
Interspeech2022
Jiatong Shi, Shuai Guo, Tao Qian, Tomoki Hayashi, Yuning Wu, Fangzheng Xu, Xuankai Chang, Huazhe Li, Peter Wu, Shinji Watanabe 0001, Qin Jin, 
Muskits: an End-to-end Music Processing Toolkit for Singing Voice Synthesis.
TASLP2021
Wen-Chin Huang, Tomoki Hayashi, Yi-Chiao Wu, Hirokazu Kameoka, Tomoki Toda, 
Pretraining Techniques for Sequence-to-Sequence Voice Conversion.
TASLP2021
Yi-Chiao Wu, Tomoki Hayashi, Takuma Okamoto, Hisashi Kawai, Tomoki Toda, 
Quasi-Periodic Parallel WaveGAN: A Non-Autoregressive Raw Waveform Generative Model With Pitch-Dependent Dilated Convolution Neural Network.
TASLP2021
Yi-Chiao Wu, Tomoki Hayashi, Patrick Lumban Tobing, Kazuhiro Kobayashi, Tomoki Toda, 
Quasi-Periodic WaveNet: An Autoregressive Raw Waveform Generative Model With Pitch-Dependent Dilated Convolution Neural Network.
ICASSP2021
Pengcheng Guo, Florian Boyer, Xuankai Chang, Tomoki Hayashi, Yosuke Higuchi, Hirofumi Inaguma, Naoyuki Kamo, Chenda Li, Daniel Garcia-Romero, Jiatong Shi, Jing Shi 0003, Shinji Watanabe 0001, Kun Wei, Wangyou Zhang, Yuekai Zhang, 
Recent Developments on Espnet Toolkit Boosted By Conformer.
ICASSP2021
Wen-Chin Huang, Yi-Chiao Wu, Tomoki Hayashi, 
Any-to-One Sequence-to-Sequence Voice Conversion Using Self-Supervised Discrete Speech Representations.
ICASSP2021
Kazuhiro Kobayashi, Wen-Chin Huang, Yi-Chiao Wu, Patrick Lumban Tobing, Tomoki Hayashi, Tomoki Toda, 
Crank: An Open-Source Software for Nonparallel Voice Conversion Based on Vector-Quantized Variational Autoencoder.
Interspeech2021
Tatsuya Komatsu, Shinji Watanabe 0001, Koichi Miyazaki, Tomoki Hayashi, 
Acoustic Event Detection with Classifier Chains.
ICASSP2020
Koichi Miyazaki, Tatsuya Komatsu, Tomoki Hayashi, Shinji Watanabe 0001, Tomoki Toda, Kazuya Takeda, 
Weakly-Supervised Sound Event Detection with Self-Attention.
ICASSP2020
Patrick Lumban Tobing, Yi-Chiao Wu, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda, 
Efficient Shallow Wavenet Vocoder Using Multiple Samples Output Based on Laplacian Distribution and Linear Prediction.
ICASSP2020
Takenori Yoshimura, Tomoki Hayashi, Kazuya Takeda, Shinji Watanabe 0001, 
End-to-End Automatic Speech Recognition Integrated with CTC-Based Voice Activity Detection.
Interspeech2020
Shu Hikosaka, Shogo Seki, Tomoki Hayashi, Kazuhiro Kobayashi, Kazuya Takeda, Hideki Banno, Tomoki Toda, 
Intelligibility Enhancement Based on Speech Waveform Modification Using Hearing Impairment.
Interspeech2020
Wen-Chin Huang, Tomoki Hayashi, Yi-Chiao Wu, Hirokazu Kameoka, Tomoki Toda, 
Voice Transformer Network: Sequence-to-Sequence Voice Conversion Using Transformer with Text-to-Speech Pretraining.
Interspeech2020
Patrick Lumban Tobing, Tomoki Hayashi, Yi-Chiao Wu, Kazuhiro Kobayashi, Tomoki Toda, 
Cyclic Spectral Modeling for Unsupervised Unit Discovery into Voice Conversion with Excitation and Waveform Modeling.
Interspeech2020
Yi-Chiao Wu, Tomoki Hayashi, Takuma Okamoto, Hisashi Kawai, Tomoki Toda, 
Quasi-Periodic Parallel WaveGAN Vocoder: A Non-Autoregressive Pitch-Dependent Dilated Convolution Model for Parametric Speech Generation.
ICASSP2019
Takaaki Hori, Ramón Fernandez Astudillo, Tomoki Hayashi, Yu Zhang 0033, Shinji Watanabe 0001, Jonathan Le Roux, 
Cycle-consistency Training for End-to-end Speech Recognition.
ICASSP2019
Patrick Lumban Tobing, Yi-Chiao Wu, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda, 
Voice Conversion with Cyclic Recurrent Neural Network and Fine-tuned Wavenet Vocoder.
Interspeech2019
Tomoki Hayashi, Shinji Watanabe 0001, Tomoki Toda, Kazuya Takeda, Shubham Toshniwal, Karen Livescu, 
Pre-Trained Text Embeddings for Enhanced Text-to-Speech Synthesis.
ICASSP2022
Takafumi Moriya, Takanori Ashihara, Atsushi Ando, Hiroshi Sato, Tomohiro Tanaka, Kohei Matsuura, Ryo Masumura, Marc Delcroix, Takahiro Shinozaki, 
Hybrid RNN-T/Attention-Based Streaming ASR with Triggered Chunkwise Attention and Dual Internal Language Model Integration.
Interspeech2022
Takanori Ashihara, Takafumi Moriya, Kohei Matsuura, Tomohiro Tanaka, 
Deep versus Wide: An Analysis of Student Architectures for Task-Agnostic Knowledge Distillation of Self-Supervised Speech Models.
Interspeech2022
Ryo Masumura, Yoshihiro Yamazaki, Saki Mizuno, Naoki Makishima, Mana Ihori, Mihiro Uchida, Hiroshi Sato, Tomohiro Tanaka, Akihiko Takashima, Satoshi Suzuki, Shota Orihashi, Takafumi Moriya, Nobukatsu Hojo, Atsushi Ando, 
End-to-End Joint Modeling of Conversation History-Dependent and Independent ASR Systems with Multi-History Training.
Interspeech2022
Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Takafumi Moriya, Naoki Makishima, Mana Ihori, Tomohiro Tanaka, Ryo Masumura, 
Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations.
Interspeech2022
Tomohiro Tanaka, Ryo Masumura, Hiroshi Sato, Mana Ihori, Kohei Matsuura, Takanori Ashihara, Takafumi Moriya, 
Domain Adversarial Self-Supervised Speech Representation Learning for Improving Unknown Domain Downstream Tasks.
ICASSP2021
Mana Ihori, Naoki Makishima, Tomohiro Tanaka, Akihiko Takashima, Shota Orihashi, Ryo Masumura, 
MAPGN: Masked Pointer-Generator Network for Sequence-to-Sequence Pre-Training.
ICASSP2021
Naoki Makishima, Mana Ihori, Akihiko Takashima, Tomohiro Tanaka, Shota Orihashi, Ryo Masumura, 
Audio-Visual Speech Separation Using Cross-Modal Correspondence Loss.
ICASSP2021
Ryo Masumura, Naoki Makishima, Mana Ihori, Akihiko Takashima, Tomohiro Tanaka, Shota Orihashi, 
Hierarchical Transformer-Based Large-Context End-To-End ASR with Large-Context Knowledge Distillation.
ICASSP2021
Takafumi Moriya, Takanori Ashihara, Tomohiro Tanaka, Tsubasa Ochiai, Hiroshi Sato, Atsushi Ando, Yusuke Ijima, Ryo Masumura, Yusuke Shinohara, 
Simpleflat: A Simple Whole-Network Pre-Training Approach for RNN Transducer-Based End-to-End Speech Recognition.
Interspeech2021
Mana Ihori, Naoki Makishima, Tomohiro Tanaka, Akihiko Takashima, Shota Orihashi, Ryo Masumura, 
Zero-Shot Joint Modeling of Multiple Spoken-Text-Style Conversion Tasks Using Switching Tokens.
Interspeech2021
Naoki Makishima, Mana Ihori, Tomohiro Tanaka, Akihiko Takashima, Shota Orihashi, Ryo Masumura, 
Enrollment-Less Training for Personalized Voice Activity Detection.
Interspeech2021
Ryo Masumura, Daiki Okamura, Naoki Makishima, Mana Ihori, Akihiko Takashima, Tomohiro Tanaka, Shota Orihashi, 
Unified Autoregressive Modeling for Joint End-to-End Multi-Talker Overlapped Speech Recognition and Speaker Attribute Estimation.
Interspeech2021
Takafumi Moriya, Tomohiro Tanaka, Takanori Ashihara, Tsubasa Ochiai, Hiroshi Sato, Atsushi Ando, Ryo Masumura, Marc Delcroix, Taichi Asami, 
Streaming End-to-End Speech Recognition for Hybrid RNN-T/Attention Architecture.
Interspeech2021
Tomohiro Tanaka, Ryo Masumura, Mana Ihori, Akihiko Takashima, Takafumi Moriya, Takanori Ashihara, Shota Orihashi, Naoki Makishima, 
Cross-Modal Transformer-Based Neural Correction Models for Automatic Speech Recognition.
Interspeech2021
Tomohiro Tanaka, Ryo Masumura, Mana Ihori, Akihiko Takashima, Shota Orihashi, Naoki Makishima, 
End-to-End Rich Transcription-Style Automatic Speech Recognition with Semi-Supervised Learning.
ICASSP2020
Shengzhou Gao, Wenxin Hou, Tomohiro Tanaka, Takahiro Shinozaki, 
Spoken Language Acquisition Based on Reinforcement Learning and Word Unit Segmentation.
ICASSP2020
Takafumi Moriya, Hiroshi Sato, Tomohiro Tanaka, Takanori Ashihara, Ryo Masumura, Yusuke Shinohara, 
Distilling Attention Weights for CTC-Based ASR Systems.
Interspeech2020
Ryo Masumura, Naoki Makishima, Mana Ihori, Akihiko Takashima, Tomohiro Tanaka, Shota Orihashi, 
Phoneme-to-Grapheme Conversion Based Large-Scale Pre-Training for End-to-End Automatic Speech Recognition.
Interspeech2020
Takafumi Moriya, Tsubasa Ochiai, Shigeki Karita, Hiroshi Sato, Tomohiro Tanaka, Takanori Ashihara, Ryo Masumura, Yusuke Shinohara, Marc Delcroix, 
Self-Distillation for Improving CTC-Transformer-Based ASR Systems.
Interspeech2020
Shota Orihashi, Mana Ihori, Tomohiro Tanaka, Ryo Masumura, 
Unsupervised Domain Adaptation for Dialogue Sequence Labeling Based on Hierarchical Adversarial Training.
ICASSP2022
Gábor Gosztolya, László Tóth 0001, Veronika Svindt, Judit Bóna, Ildikó Hoffmann, 
Using Acoustic Deep Neural Network Embeddings to Detect Multiple Sclerosis From Speech.
ICASSP2022
José Vicente Egas López, Gábor Kiss, Dávid Sztahó, Gábor Gosztolya, 
Automatic Assessment of the Degree of Clinical Depression from Speech Using X-Vectors.
ICASSP2022
Mercedes Vetráb, José Vicente Egas López, Réka Balogh, Nóra Imre, Ildikó Hoffmann, László Tóth 0001, Magdolna Pákáski, János Kálmán, Gábor Gosztolya, 
Using Spectral Sequence-to-Sequence Autoencoders to Assess Mild Cognitive Impairment.
TASLP2021
Gábor Gosztolya, Róbert Busa-Fekete, 
Ensemble Bag-of-Audio-Words Representation Improves Paralinguistic Classification Accuracy.
ICASSP2021
José Vicente Egas López, Gábor Gosztolya, 
Deep Neural Network Embeddings for the Estimation of the Degree of Sleepiness.
Interspeech2021
José Vicente Egas López, Mercedes Vetráb, László Tóth 0001, Gábor Gosztolya, 
Identifying Conflict Escalation and Primates by Using Ensemble X-Vectors and Fisher Vector Features.
Interspeech2021
Amin Honarmandi Shandiz, László Tóth 0001, Gábor Gosztolya, Alexandra Markó, Tamás Gábor Csapó, 
Neural Speaker Embeddings for Ultrasound-Based Silent Speech Interfaces.
Interspeech2020
Tamás Gábor Csapó, Csaba Zainkó, László Tóth 0001, Gábor Gosztolya, Alexandra Markó, 
Ultrasound-Based Articulatory-to-Acoustic Mapping with WaveGlow Speech Synthesis.
Interspeech2020
Gábor Gosztolya, 
Very Short-Term Conflict Intensity Estimation Using Fisher Vectors.
Interspeech2020
Gábor Gosztolya, Anita Bagi, Szilvia Szalóki, István Szendi, Ildikó Hoffmann, 
Making a Distinction Between Schizophrenia and Bipolar Disorder Based on Temporal Parameters in Spontaneous Speech.
Interspeech2019
Tamás Gábor Csapó, Mohammed Salah Al-Radhi, Géza Németh, Gábor Gosztolya, Tamás Grósz, László Tóth 0001, Alexandra Markó, 
Ultrasound-Based Silent Speech Interface Built on a Continuous Vocoder.
Interspeech2019
Gábor Gosztolya, 
Using Fisher Vector and Bag-of-Audio-Words Representations to Identify Styrian Dialects, Sleepiness, Baby & Orca Sounds.
Interspeech2019
Gábor Gosztolya, 
Using the Bag-of-Audio-Word Feature Representation of ASR DNN Posteriors for Paralinguistic Classification.
Interspeech2019
Gábor Gosztolya, László Tóth 0001, 
Calibrating DNN Posterior Probability Estimates of HMM/DNN Models to Improve Social Signal Detection from Audio Data.
Interspeech2019
José Vicente Egas López, Juan Rafael Orozco-Arroyave, Gábor Gosztolya, 
Assessing Parkinson's Disease from Speech Using Fisher Vectors.
ICASSP2018
Tamás Grósz, Gábor Gosztolya, László Tóth 0001, Tamás Gábor Csapó, Alexandra Markó, 
F0 Estimation for DNN-Based Ultrasound Silent Speech Interfaces.
Interspeech2018
Gábor Gosztolya, Anita Bagi, Szilvia Szalóki, István Szendi, Ildikó Hoffmann, 
Identifying Schizophrenia Based on Temporal Parameters in Spontaneous Speech.
Interspeech2018
Gábor Gosztolya, Tamás Grósz, László Tóth 0001, 
General Utterance-Level Feature Extraction for Classifying Crying Sounds, Atypical & Self-Assessed Affect and Heart Beats.
Interspeech2018
László Tóth 0001, Gábor Gosztolya, Tamás Grósz, Alexandra Markó, Tamás Gábor Csapó, 
Multi-Task Learning of Speech Recognition and Speech Synthesis Parameters for Ultrasound-based Silent Speech Interfaces.
Interspeech2018
Máté Ákos Tündik, György Szaszák, Gábor Gosztolya, András Beke, 
User-centric Evaluation of Automatic Punctuation in ASR Closed Captioning.
ICASSP2022
Ke Hu, Tara N. Sainath, Arun Narayanan, Ruoming Pang, Trevor Strohman, 
Transducer-Based Streaming Deliberation for Cascaded Encoders.
ICASSP2022
Tara N. Sainath, Yanzhang He, Arun Narayanan, Rami Botros, Weiran Wang, David Qiu, Chung-Cheng Chiu, Rohit Prabhavalkar, Alexander Gruenstein, Anmol Gulati, Bo Li 0028, David Rybach, Emmanuel Guzman, Ian McGraw, James Qin, Krzysztof Choromanski, Qiao Liang, Robert David, Ruoming Pang, Shuo-Yiin Chang, Trevor Strohman, W. Ronny Huang, Wei Han 0002, Yonghui Wu, Yu Zhang 0033, 
Improving The Latency And Quality Of Cascaded Encoders.
Interspeech2022
W. Ronny Huang, Cal Peyser, Tara N. Sainath, Ruoming Pang, Trevor D. Strohman, Shankar Kumar, 
Sentence-Select: Large-Scale Language Model Data Selection for Rare-Word Speech Recognition.
Interspeech2022
Bo Li 0028, Tara N. Sainath, Ruoming Pang, Shuo-Yiin Chang, Qiumin Xu, Trevor Strohman, Vince Chen, Qiao Liang, Heguang Liu, Yanzhang He, Parisa Haghani, Sameer Bidichandani, 
A Language Agnostic Multilingual Streaming On-Device ASR System.
ICASSP2021
Thibault Doutre, Wei Han 0002, Min Ma, Zhiyun Lu, Chung-Cheng Chiu, Ruoming Pang, Arun Narayanan, Ananya Misra, Yu Zhang 0033, Liangliang Cao, 
Improving Streaming Automatic Speech Recognition with Non-Streaming Model Distillation on Unsupervised Data.
ICASSP2021
Bo Li 0028, Anmol Gulati, Jiahui Yu, Tara N. Sainath, Chung-Cheng Chiu, Arun Narayanan, Shuo-Yiin Chang, Ruoming Pang, Yanzhang He, James Qin, Wei Han 0002, Qiao Liang, Yu Zhang 0033, Trevor Strohman, Yonghui Wu, 
A Better and Faster end-to-end Model for Streaming ASR.
ICASSP2021
Zhaofeng Wu, Ding Zhao, Qiao Liang, Jiahui Yu, Anmol Gulati, Ruoming Pang, 
Dynamic Sparsity Neural Networks for Automatic Speech Recognition.
ICASSP2021
Jiahui Yu, Chung-Cheng Chiu, Bo Li 0028, Shuo-Yiin Chang, Tara N. Sainath, Yanzhang He, Arun Narayanan, Wei Han 0002, Anmol Gulati, Yonghui Wu, Ruoming Pang, 
FastEmit: Low-Latency Streaming ASR with Sequence-Level Emission Regularization.
Interspeech2021
Thibault Doutre, Wei Han 0002, Chung-Cheng Chiu, Ruoming Pang, Olivier Siohan, Liangliang Cao, 
Bridging the Gap Between Streaming and Non-Streaming ASR Systems by Distilling Ensembles of CTC and RNN-T Models.
Interspeech2021
Tara N. Sainath, Yanzhang He, Arun Narayanan, Rami Botros, Ruoming Pang, David Rybach, Cyril Allauzen, Ehsan Variani, James Qin, Quoc-Nam Le-The, Shuo-Yiin Chang, Bo Li 0028, Anmol Gulati, Jiahui Yu, Chung-Cheng Chiu, Diamantino Caseiro, Wei Li 0133, Qiao Liang, Pat Rondon, 
An Efficient Streaming Non-Recurrent On-Device End-to-End Model with Improvements to Rare-Word Modeling.
Interspeech2021
Andros Tjandra, Ruoming Pang, Yu Zhang 0033, Shigeki Karita, 
Unsupervised Learning of Disentangled Speech Content and Style Representation.
ICLR2021
Jiahui Yu, Wei Han 0002, Anmol Gulati, Chung-Cheng Chiu, Bo Li 0028, Tara N. Sainath, Yonghui Wu, Ruoming Pang, 
Dual-mode ASR: Unify and Improve Streaming ASR with Full-context Modeling.
ICASSP2020
Ke Hu, Tara N. Sainath, Ruoming Pang, Rohit Prabhavalkar, 
Deliberation Model Based Two-Pass End-To-End Speech Recognition.
ICASSP2020
Bo Li 0028, Shuo-Yiin Chang, Tara N. Sainath, Ruoming Pang, Yanzhang He, Trevor Strohman, Yonghui Wu, 
Towards Fast and Accurate Streaming End-To-End ASR.
ICASSP2020
Tara N. Sainath, Yanzhang He, Bo Li 0028, Arun Narayanan, Ruoming Pang, Antoine Bruguier, Shuo-Yiin Chang, Wei Li 0133, Raziel Alvarez, Zhifeng Chen, Chung-Cheng Chiu, David Garcia, Alexander Gruenstein, Ke Hu, Anjuli Kannan, Qiao Liang, Ian McGraw, Cal Peyser, Rohit Prabhavalkar, Golan Pundak, David Rybach, Yuan Shangguan, Yash Sheth, Trevor Strohman, Mirkó Visontai, Yonghui Wu, Yu Zhang 0033, Ding Zhao, 
A Streaming On-Device End-To-End Model Surpassing Server-Side Conventional Model Quality and Latency.
ICASSP2020
Tara N. Sainath, Ruoming Pang, Ron J. Weiss, Yanzhang He, Chung-Cheng Chiu, Trevor Strohman, 
An Attention-Based Joint Acoustic and Text on-Device End-To-End Model.
Interspeech2020
Anmol Gulati, James Qin, Chung-Cheng Chiu, Niki Parmar, Yu Zhang 0033, Jiahui Yu, Wei Han 0002, Shibo Wang, Zhengdong Zhang, Yonghui Wu, Ruoming Pang, 
Conformer: Convolution-augmented Transformer for Speech Recognition.
Interspeech2020
Wei Han 0002, Zhengdong Zhang, Yu Zhang 0033, Jiahui Yu, Chung-Cheng Chiu, James Qin, Anmol Gulati, Ruoming Pang, Yonghui Wu, 
ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context.
Interspeech2020
Wei Li 0133, James Qin, Chung-Cheng Chiu, Ruoming Pang, Yanzhang He, 
Parallel Rescoring with Transformer for Streaming On-Device Speech Recognition.
Interspeech2020
Cal Peyser, Sepand Mavandadi, Tara N. Sainath, James Apfel, Ruoming Pang, Shankar Kumar, 
Improving Tail Performance of a Deliberation E2E ASR Model Using a Large Text Corpus.
SpeechComm2023
Qiujia Li, Chao Zhang 0031, Philip C. Woodland, 
Combining hybrid DNN-HMM ASR systems with attention-based models using lattice rescoring.
ICASSP2022
Qiujia Li, Yu Zhang 0033, David Qiu, Yanzhang He, Liangliang Cao, Philip C. Woodland, 
Improving Confidence Estimation on Out-of-Domain Data for End-to-End Speech Recognition.
ICASSP2022
Xiaoyu Yang, Qiujia Li, Philip C. Woodland, 
Knowledge Distillation for Neural Transducers from Large Self-Supervised Pre-Trained Models.
Interspeech2022
Guangzhi Sun, Chao Zhang 0031, Philip C. Woodland, 
Tree-constrained Pointer Generator with Graph Neural Network Encodings for Contextual Speech Recognition.
Interspeech2022
Xianrui Zheng, Chao Zhang 0031, Philip C. Woodland, 
Tandem Multitask Training of Speaker Diarisation and Speech Recognition for Meeting Transcription.
ICASSP2021
Qiujia Li, David Qiu, Yu Zhang 0033, Bo Li 0028, Yanzhang He, Philip C. Woodland, Liangliang Cao, Trevor Strohman, 
Confidence Estimation for Attention-Based Sequence-to-Sequence Models for Speech Recognition.
ICASSP2021
Guangzhi Sun, Chao Zhang 0031, Philip C. Woodland, 
Transformer Language Models with LSTM-Based Cross-Utterance Information Representation.
ICASSP2021
Guangzhi Sun, D. Liu, Chao Zhang 0031, Philip C. Woodland, 
Content-Aware Speaker Embeddings for Speaker Diarisation.
ICASSP2021
Wen Wu, Chao Zhang 0031, Philip C. Woodland, 
Emotion Recognition by Fusing Time Synchronous and Time Asynchronous Representations.
Interspeech2021
Dongcheng Jiang, Chao Zhang 0031, Philip C. Woodland, 
Variable Frame Rate Acoustic Models Using Minimum Error Reinforcement Learning.
Interspeech2021
Qiujia Li, Yu Zhang 0033, Bo Li 0028, Liangliang Cao, Philip C. Woodland, 
Residual Energy-Based Models for End-to-End Speech Recognition.
Interspeech2020
Florian L. Kreyssig, Philip C. Woodland, 
Cosine-Distance Virtual Adversarial Training for Semi-Supervised Speaker-Discriminative Acoustic Embeddings.
ICASSP2019
Guangzhi Sun, Chao Zhang 0031, Philip C. Woodland, 
Speaker Diarisation Using 2D Self-attentive Combination of Embeddings.
ICASSP2019
Chao Zhang 0031, Florian L. Kreyssig, Qiujia Li, Philip C. Woodland, 
PyHTK: Python Library and ASR Pipelines for HTK.
Interspeech2019
Patrick von Platen, Chao Zhang 0031, Philip C. Woodland, 
Multi-Span Acoustic Modelling Using Raw Waveform Signals.
ICASSP2018
Florian L. Kreyssig, Chao Zhang 0031, Philip C. Woodland, 
Improved Tdnns Using Deep Kernels and Frequency Dependent Grid-RNNS.
ICASSP2018
Chao Zhang 0031, Philip C. Woodland, 
High Order Recurrent Neural Networks for Acoustic Modelling.
Interspeech2018
Adnan Haider, Philip C. Woodland, 
Combining Natural Gradient with Hessian Free Methods for Sequence Training.
Interspeech2018
Yu Wang 0027, Chao Zhang 0031, Mark J. F. Gales, Philip C. Woodland, 
Speaker Adaptation and Adaptive Training for Jointly Optimised Tandem Systems.
Interspeech2018
Chao Zhang 0031, Philip C. Woodland, 
Semi-tied Units for Efficient Gating in LSTM and Highway Networks.
TASLP2022
Bin Wu, Sakriani Sakti, Jinsong Zhang 0001, Satoshi Nakamura 0001, 
Modeling Unsupervised Empirical Adaptation by DPGMM and DPGMM-RNN Hybrid Model to Extract Perceptual Features for Low-Resource ASR.
Interspeech2022
Jingwen Cheng, Yuchen Yan, Yingming Gao, Xiaoli Feng, Yannan Wang, Jinsong Zhang 0001, 
A study of production error analysis for Mandarin-speaking Children with Hearing Impairment.
Interspeech2022
Yujia Jin, Yanlu Xie, Jinsong Zhang 0001, 
A VR Interactive 3D Mandarin Pronunciation Teaching Model.
Interspeech2022
Longfei Yang, Jinsong Zhang 0001, Takahiro Shinozaki, 
Self-Supervised Learning with Multi-Target Contrastive Coding for Non-Native Acoustic Modeling of Mispronunciation Verification.
TASLP2021
Bin Wu, Sakriani Sakti, Jinsong Zhang 0001, Satoshi Nakamura 0001, 
Tackling Perception Bias in Unsupervised Phoneme Discovery Using DPGMM-RNN Hybrid Model and Functional Load.
Interspeech2021
Linkai Peng, Kaiqi Fu, Binghuai Lin, Dengfeng Ke, Jinsong Zhang 0001, 
A Study on Fine-Tuning wav2vec2.0 Model for the Task of Mispronunciation Detection and Diagnosis.
Interspeech2021
Yuqing Zhang 0003, Zhu Li, Binghuai Lin, Jinsong Zhang 0001, 
A Preliminary Study on Discourse Prosody Encoding in L1 and L2 English Spontaneous Narratives.
Interspeech2021
Yuqing Zhang 0003, Zhu Li, Bin Wu, Yanlu Xie, Binghuai Lin, Jinsong Zhang 0001, 
Relationships Between Perceptual Distinctiveness, Articulatory Complexity and Functional Load in Speech Communication.
Interspeech2020
Wang Dai, Jinsong Zhang 0001, Yingming Gao, Wei Wei, Dengfeng Ke, Binghuai Lin, Yanlu Xie, 
Formant Tracking Using Dilated Convolutional Networks Through Dense Connection with Gating Mechanism.
Interspeech2020
Dan Du, Xianjin Zhu, Zhu Li, Jinsong Zhang 0001, 
Perception and Production of Mandarin Initial Stops by Native Urdu Speakers.
Interspeech2020
Yingming Gao, Xinyu Zhang, Yi Xu, Jinsong Zhang 0001, Peter Birkholz, 
An Investigation of the Target Approximation Model for Tone Modeling and Recognition in Continuous Mandarin Speech.
Interspeech2020
Binghuai Lin, Liyuan Wang, Xiaoli Feng, Jinsong Zhang 0001, 
Automatic Scoring at Multi-Granularity for L2 Pronunciation.
Interspeech2020
Binghuai Lin, Liyuan Wang, Xiaoli Feng, Jinsong Zhang 0001, 
Joint Detection of Sentence Stress and Phrase Boundary for Prosody.
Interspeech2020
Yanlu Xie, Xiaoli Feng, Boxue Li, Jinsong Zhang 0001, Yujia Jin, 
A Mandarin L2 Learning APP with Mispronunciation Detection and Feedback.
Interspeech2020
Longfei Yang, Kaiqi Fu, Jinsong Zhang 0001, Takahiro Shinozaki, 
Pronunciation Erroneous Tendency Detection with Language Adversarial Represent Learning.
Interspeech2019
Dan Du, Jinsong Zhang 0001, 
The Production of Chinese Affricates /ts/ and /tsh/ by Native Urdu Speakers.
Interspeech2019
Shuju Shi, Chilin Shih, Jinsong Zhang 0001, 
Capturing L1 Influence on L2 Pronunciation by Simulating Perceptual Space Using Acoustic Features.
Interspeech2018
Chong Cao, Wei Wei, Wei Wang, Yanlu Xie, Jinsong Zhang 0001, 
Interactions between Vowels and Nasal Codas in Mandarin Speakers' Perception of Nasal Finals.
Interspeech2018
Lixia Hao, Wei Zhang 0190, Yanlu Xie, Jinsong Zhang 0001, 
A Preliminary Study on Tonal Coarticulation in Continuous Speech.
Interspeech2018
Yue Sun, Win Thuzar Kyaw, Jinsong Zhang 0001, Yoshinori Sagisaka, 
Analysis of L2 Learners' Progress of Distinguishing Mandarin Tone 2 and Tone 3.
Interspeech2022
Matthew Perez, Mimansa Jaiswal, Minxue Niu, Cristina Gorrostieta, Matthew Roddy, Kye Taylor, Reza Lotfian, John Kane, Emily Mower Provost, 
Mind the gap: On the value of silence representations to lexical-based speech emotion recognition.
Interspeech2022
Amrit Romana, Minxue Niu, Matthew Perez, Angela Roberts, Emily Mower Provost, 
Enabling Off-the-Shelf Disfluency Detection and Categorization for Pathological Speech.
SpeechComm2021
Brian Stasak, Julien Epps, Heather T. Schatten, Ivan W. Miller, Emily Mower Provost, Michael F. Armey, 
Read speech voice quality and disfluency in individuals with recent suicidal ideation or suicide attempt.
Interspeech2021
Matthew Perez, Amrit Romana, Angela Roberts, Noelle Carlozzi, Jennifer Ann Miner, Praveen Dayalu, Emily Mower Provost, 
Articulatory Coordination for Speech Motor Tracking in Huntington Disease.
Interspeech2021
Amrit Romana, John Bandon, Matthew Perez, Stephanie Gutierrez, Richard Richter, Angela Roberts, Emily Mower Provost, 
Automatically Detecting Errors and Disfluencies in Read Speech to Predict Cognitive Impairment in People with Parkinson's Disease.
NAACL2021
Zakaria Aldeneh, Matthew Perez, Emily Mower Provost, 
Learning Paralinguistic Features from Audiobooks through Style Voice Conversion.
Interspeech2020
Matthew Perez, Zakaria Aldeneh, Emily Mower Provost, 
Aphasic Speech Recognition Using a Mixture of Speech Intelligibility Experts.
Interspeech2020
Amrit Romana, John Bandon, Noelle Carlozzi, Angela Roberts, Emily Mower Provost, 
Classification of Manifest Huntington Disease Using Vowel Distortion Measures.
ICASSP2019
Mimansa Jaiswal, Zakaria Aldeneh, Cristian-Paul Bara, Yuanhang Luo, Mihai Burzo, Rada Mihalcea, Emily Mower Provost, 
Muse-ing on the Impact of Utterance Ordering on Crowdsourced Emotion Annotations.
ICASSP2019
Biqiao Zhang, Soheil Khorram, Emily Mower Provost, 
Exploiting Acoustic and Lexical Properties of Phonemes to Recognize Valence from Speech.
Interspeech2019
Zakaria Aldeneh, Mimansa Jaiswal, Michael Picheny, Melvin G. McInnis, Emily Mower Provost, 
Identifying Mood Episodes Using Dialogue Features from Clinical Interviews.
Interspeech2019
John Gideon, Heather T. Schatten, Melvin G. McInnis, Emily Mower Provost, 
Emotion Recognition from Natural Phone Conversations in Individuals with and without Recent Suicidal Ideation.
Interspeech2019
Katie Matton, Melvin G. McInnis, Emily Mower Provost, 
Into the Wild: Transitioning from Recognizing Mood in Clinical Interactions to Personal Conversations for Individuals with Bipolar Disorder.
AAAI2019
Biqiao Zhang, Yuqing Kong, Georg Essl, Emily Mower Provost, 
f-Similarity Preservation Loss for Soft Labels: A Demonstration on Cross-Corpus Speech Emotion Recognition.
SpeechComm2018
Duc Le, Keli Licata, Emily Mower Provost, 
Automatic quantitative analysis of spontaneous aphasic speech.
ICASSP2018
Zakaria Aldeneh, Dimitrios Dimitriadis, Emily Mower Provost, 
Improving End-of-Turn Detection in Spoken Dialogues by Detecting Speaker Intentions as a Secondary Task.
Interspeech2018
Soheil Khorram, Mimansa Jaiswal, John Gideon, Melvin G. McInnis, Emily Mower Provost, 
The PRIORI Emotion Dataset: Linking Mood to Emotion Detected In-the-Wild.
Interspeech2018
Matthew Perez, Wenyu Jin 0001, Duc Le, Noelle Carlozzi, Praveen Dayalu, Angela Roberts, Emily Mower Provost, 
Classification of Huntington Disease Using Acoustic and Lexical Features.
Interspeech2017
John Gideon, Soheil Khorram, Zakaria Aldeneh, Dimitrios Dimitriadis, Emily Mower Provost, 
Progressive Neural Networks for Transfer Learning in Emotion Recognition.
Interspeech2017
Soheil Khorram, Zakaria Aldeneh, Dimitrios Dimitriadis, Melvin G. McInnis, Emily Mower Provost, 
Capturing Long-Term Temporal Dependencies with Convolutional Networks for Continuous Emotion Recognition.
ICASSP2021
Isaac Elias, Heiga Zen, Jonathan Shen, Yu Zhang 0033, Ye Jia, Ron J. Weiss, Yonghui Wu, 
Parallel Tacotron: Non-Autoregressive and Controllable TTS.
Interspeech2021
Nanxin Chen, Yu Zhang 0033, Heiga Zen, Ron J. Weiss, Mohammad Norouzi 0002, Najim Dehak, William Chan, 
WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis.
Interspeech2021
Peidong Wang, Tara N. Sainath, Ron J. Weiss, 
Multitask Training with Text Data for End-to-End Speech Recognition.
ICLR2021
Nanxin Chen, Yu Zhang 0033, Heiga Zen, Ron J. Weiss, Mohammad Norouzi 0002, William Chan, 
WaveGrad: Estimating Gradients for Waveform Generation.
ICASSP2020
Tara N. Sainath, Ruoming Pang, Ron J. Weiss, Yanzhang He, Chung-Cheng Chiu, Trevor Strohman, 
An Attention-Based Joint Acoustic and Text on-Device End-To-End Model.
ICASSP2020
Guangzhi Sun, Yu Zhang 0033, Ron J. Weiss, Yuan Cao 0007, Heiga Zen, Yonghui Wu, 
Fully-Hierarchical Fine-Grained Prosody Modeling For Interpretable Speech Synthesis.
TASLP2019
Jan Chorowski, Ron J. Weiss, Samy Bengio, Aäron van den Oord, 
Unsupervised Speech Representation Learning Using WaveNet Autoencoders.
ICASSP2019
Jinxi Guo, Tara N. Sainath, Ron J. Weiss, 
A Spelling Correction Model for End-to-end Speech Recognition.
ICASSP2019
Wei-Ning Hsu, Yu Zhang 0033, Ron J. Weiss, Yu-An Chung, Yuxuan Wang 0002, Yonghui Wu, James R. Glass, 
Disentangling Correlated Speaker and Noise for Speech Synthesis via Data Augmentation and Adversarial Factorization.
Interspeech2019
Fadi Biadsy, Ron J. Weiss, Pedro J. Moreno 0001, Dimitri Kanvesky, Ye Jia, 
Parrotron: An End-to-End Speech-to-Speech Conversion Model and its Applications to Hearing-Impaired Speech and Speech Separation.
Interspeech2019
Ye Jia, Ron J. Weiss, Fadi Biadsy, Wolfgang Macherey, Melvin Johnson, Zhifeng Chen, Yonghui Wu, 
Direct Speech-to-Speech Translation with a Sequence-to-Sequence Model.
Interspeech2019
Quan Wang, Hannah Muckenhirn, Kevin W. Wilson, Prashant Sridhar, Zelin Wu, John R. Hershey, Rif A. Saurous, Ron J. Weiss, Ye Jia, Ignacio Lopez-Moreno, 
VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking.
Interspeech2019
Heiga Zen, Viet Dang, Rob Clark, Yu Zhang 0033, Ron J. Weiss, Ye Jia, Zhifeng Chen, Yonghui Wu, 
LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech.
Interspeech2019
Yu Zhang 0033, Ron J. Weiss, Heiga Zen, Yonghui Wu, Zhifeng Chen, R. J. Skerry-Ryan, Ye Jia, Andrew Rosenberg, Bhuvana Ramabhadran, 
Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning.
ICLR2019
Wei-Ning Hsu, Yu Zhang 0033, Ron J. Weiss, Heiga Zen, Yonghui Wu, Yuxuan Wang 0002, Yuan Cao 0007, Ye Jia, Zhifeng Chen, Jonathan Shen, Patrick Nguyen, Ruoming Pang, 
Hierarchical Generative Modeling for Controllable Speech Synthesis.
ICASSP2018
Chung-Cheng Chiu, Tara N. Sainath, Yonghui Wu, Rohit Prabhavalkar, Patrick Nguyen, Zhifeng Chen, Anjuli Kannan, Ron J. Weiss, Kanishka Rao, Ekaterina Gonina, Navdeep Jaitly, Bo Li 0028, Jan Chorowski, Michiel Bacchiani, 
State-of-the-Art Speech Recognition with Sequence-to-Sequence Models.
ICASSP2018
Jan Chorowski, Ron J. Weiss, Rif A. Saurous, Samy Bengio, 
On Using Backpropagation for Speech Texture Generation and Voice Conversion.
ICASSP2018
Jonathan Shen, Ruoming Pang, Ron J. Weiss, Mike Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang 0033, Yuxuan Wang 0002, RJ-Skerrv Ryan, Rif A. Saurous, Yannis Agiomyrgiannakis, Yonghui Wu, 
Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions.
ICASSP2018
Shubham Toshniwal, Tara N. Sainath, Ron J. Weiss, Bo Li 0028, Pedro J. Moreno 0001, Eugene Weinstein, Kanishka Rao, 
Multilingual Speech Recognition with a Single End-to-End Model.
ICML2018
R. J. Skerry-Ryan, Eric Battenberg, Ying Xiao, Yuxuan Wang 0002, Daisy Stanton, Joel Shor, Ron J. Weiss, Rob Clark, Rif A. Saurous, 
Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron.
ICASSP2022
Naoyuki Kanda, Xiong Xiao, Yashesh Gaur, Xiaofei Wang 0009, Zhong Meng, Zhuo Chen 0006, Takuya Yoshioka, 
Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers Using End-to-End Speaker-Attributed ASR.
Interspeech2022
Naoyuki Kanda, Jian Wu 0027, Yu Wu 0012, Xiong Xiao, Zhong Meng, Xiaofei Wang 0009, Yashesh Gaur, Zhuo Chen 0006, Jinyu Li 0001, Takuya Yoshioka, 
Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings.
Interspeech2022
Naoyuki Kanda, Jian Wu 0027, Yu Wu 0012, Xiong Xiao, Zhong Meng, Xiaofei Wang 0009, Yashesh Gaur, Zhuo Chen 0006, Jinyu Li 0001, Takuya Yoshioka, 
Streaming Multi-Talker ASR with Token-Level Serialized Output Training.
Interspeech2022
Wangyou Zhang, Zhuo Chen 0006, Naoyuki Kanda, Shujie Liu 0001, Jinyu Li 0001, Sefik Emre Eskimez, Takuya Yoshioka, Xiong Xiao, Zhong Meng, Yanmin Qian, Furu Wei, 
Separating Long-Form Speech with Group-wise Permutation Invariant Training.
ICASSP2021
Jeremy H. M. Wong, Xiong Xiao, Yifan Gong 0001, 
Hidden Markov Model Diarisation with Speaker Location Information.
ICASSP2021
Xiong Xiao, Naoyuki Kanda, Zhuo Chen 0006, Tianyan Zhou, Takuya Yoshioka, Sanyuan Chen, Yong Zhao 0008, Gang Liu, Yu Wu 0012, Jian Wu 0027, Shujie Liu 0001, Jinyu Li 0001, Yifan Gong 0001, 
Microsoft Speaker Diarization System for the Voxceleb Speaker Recognition Challenge 2020.
ICASSP2020
Zhuo Chen 0006, Takuya Yoshioka, Liang Lu 0001, Tianyan Zhou, Zhong Meng, Yi Luo 0004, Jian Wu 0027, Xiong Xiao, Jinyu Li 0001, 
Continuous Speech Separation: Dataset and Analysis.
ICASSP2020
Jixuan Wang, Xiong Xiao, Jian Wu 0027, Ranjani Ramamurthy, Frank Rudzicz, Michael Brudno, 
Speaker Diarization with Session-Level Speaker Embedding Refinement Using Graph Neural Networks.
Interspeech2020
Jixuan Wang, Xiong Xiao, Jian Wu 0027, Ranjani Ramamurthy, Frank Rudzicz, Michael Brudno, 
Speaker Attribution with Voice Profiles by Graph-Based Semi-Supervised Learning.
ICASSP2019
Takuya Yoshioka, Zhuo Chen 0006, Changliang Liu, Xiong Xiao, Hakan Erdogan, Dimitrios Dimitriadis, 
Low-latency Speaker-independent Continuous Speech Separation.
SpeechComm2018
Van Tung Pham, Haihua Xu, Xiong Xiao, Nancy F. Chen, Eng Siong Chng, Haizhou Li 0001, 
Re-ranking spoken term detection with acoustic exemplars of keywords.
ICASSP2018
Jinyu Li 0001, Rui Zhao 0017, Zhuo Chen 0006, Changliang Liu, Xiong Xiao, Guoli Ye, Yifan Gong 0001, 
Developing Far-Field Speaker System Via Teacher-Student Learning.
Interspeech2018
Takuya Yoshioka, Hakan Erdogan, Zhuo Chen 0006, Xiong Xiao, Fil Alleva, 
Recognizing Overlapped Speech in Meetings: A Multichannel Separation Approach Using Neural Networks.
Interspeech2017
Kong-Aik Lee, Ville Hautamäki, Tomi Kinnunen, Anthony Larcher, Chunlei Zhang, Andreas Nautsch, Themos Stafylakis, Gang Liu 0001, Mickaël Rouvier, Wei Rao, Federico Alegre, J. Ma, Man-Wai Mak, Achintya Kumar Sarkar, Héctor Delgado, Rahim Saeidi, Hagai Aronowitz, Aleksandr Sizov, Hanwu Sun, Trung Hieu Nguyen 0001, G. Wang, Bin Ma 0001, Ville Vestman, Md. Sahidullah, M. Halonen, Anssi Kanervisto, Gaël Le Lan, Fahimeh Bahmaninezhad, Sergey Isadskiy, Christian Rathgeb, Christoph Busch 0001, Georgios Tzimiropoulos, Q. Qian, Z. Wang, Q. Zhao, T. Wang, H. Li, J. Xue, S. Zhu, R. Jin, T. Zhao, Pierre-Michel Bousquet, Moez Ajili, Waad Ben Kheder, Driss Matrouf, Zhi Hao Lim, Chenglin Xu, Haihua Xu, Xiong Xiao, Eng Siong Chng, Benoit G. B. Fauve, Kaavya Sriskandaraja, Vidhyasaharan Sethu, W. W. Lin, Dennis Alexander Lehmann Thomsen, Zheng-Hua Tan, Massimiliano Todisco, Nicholas W. D. Evans, Haizhou Li 0001, John H. L. Hansen, Jean-François Bonastre, Eliathamby Ambikairajah, 
The I4U Mega Fusion and Collaboration for NIST Speaker Recognition Evaluation 2016.
Interspeech2017
Chenglin Xu, Xiong Xiao, Sining Sun, Wei Rao, Eng Siong Chng, Haizhou Li 0001, 
Weighted Spatial Covariance Matrix Estimation for MUSIC Based TDOA Estimation of Speech Source.
TASLP2016
Duc Hoang Ha Nguyen, Xiong Xiao, Eng Siong Chng, Haizhou Li 0001, 
Feature Adaptation Using Linear Spectro-Temporal Transform for Robust Speech Recognition.
ICASSP2016
Nancy F. Chen, Van Tung Pham, Haihua Xu, Xiong Xiao, Van Hai Do, Chongjia Ni, I-Fan Chen, Sunil Sivadas, Chin-Hui Lee, Eng Siong Chng, Bin Ma 0001, Haizhou Li 0001, 
Exemplar-inspired strategies for low-resource spoken keyword search in Swahili.
ICASSP2016
Tian Tan 0002, Yanmin Qian, Dong Yu 0001, Souvik Kundu 0003, Liang Lu 0001, Khe Chai Sim, Xiong Xiao, Yu Zhang 0033, 
Speaker-aware training of LSTM-RNNS for acoustic modelling.
ICASSP2016
Xiaohai Tian, Zhizheng Wu 0001, Xiong Xiao, Eng Siong Chng, Haizhou Li 0001, 
Spoofing detection from a feature representation perspective.
ICASSP2016
Xiong Xiao, Shengkui Zhao, Thi Ngoc Tho Nguyen, Douglas L. Jones, Eng Siong Chng, Haizhou Li 0001, 
An expectation-maximization eigenvector clustering approach to direction of arrival estimation of multiple speech sources.
TASLP2022
Zhengjun Yue, Erfan Loweimi, Heidi Christensen, Jon Barker, Zoran Cvetkovic, 
Acoustic Modelling From Raw Source and Filter Components for Dysarthric Speech Recognition.
Interspeech2022
Samuel Hollands, Daniel Blackburn, Heidi Christensen, 
Evaluating the Performance of State-of-the-Art ASR Systems on Non-Native English using Corpora with Extensive Language Background Variation.
Interspeech2022
Bahman Mirheidari, Daniel Blackburn, Heidi Christensen, 
Automatic cognitive assessment: Combining sparse datasets with disparate cognitive scores.
Interspeech2022
Bahman Mirheidari, André Bittar, Nicholas Cummins, Johnny Downs, Helen L. Fisher, Heidi Christensen, 
Automatic Detection of Expressed Emotion from Five-Minute Speech Samples: Challenges and Opportunities.
Interspeech2022
Zhengjun Yue, Erfan Loweimi, Heidi Christensen, Jon Barker, Zoran Cvetkovic, 
Dysarthric Speech Recognition From Raw Waveform with Parametric CNNs.
SpeechComm2021
Lubna Alhinti, Heidi Christensen, Stuart P. Cunningham, 
Acoustic differences in emotional speech of people with dysarthria.
ICASSP2021
Yilin Pan, Venkata Srikanth Nallanthighal, Daniel Blackburn, Heidi Christensen, Aki Härmä, 
Multi-Task Estimation of Age and Cognitive Decline from Speech.
Interspeech2021
Heidi Christensen, 
Towards Automatic Speech Recognition for People with Atypical Speech.
Interspeech2021
Bahman Mirheidari, Yilin Pan, Daniel Blackburn, Ronan O'Malley, Heidi Christensen, 
Identifying Cognitive Impairment Using Sentence Representation Vectors.
Interspeech2021
Yilin Pan, Bahman Mirheidari, Jennifer M. Harris, Jennifer C. Thompson, Matthew Jones, Julie S. Snowden, Daniel Blackburn, Heidi Christensen, 
Using the Outputs of Different Automatic Speech Recognition Paradigms for Acoustic- and BERT-Based Alzheimer's Dementia Detection Through Spontaneous Speech.
Interspeech2021
Zhengjun Yue, Jon Barker, Heidi Christensen, Cristina McKean, Elaine Ashton, Yvonne Wren, Swapnil Gadgil, Rebecca Bright, 
Parental Spoken Scaffolding and Narrative Skills in Crowd-Sourced Storytelling Samples of Young Children.
ICASSP2020
Feifei Xiong, Jon Barker, Zhengjun Yue, Heidi Christensen, 
Source Domain Data Selection for Improved Transfer Learning Targeting Dysarthric Speech Recognition.
ICASSP2020
Zhengjun Yue, Feifei Xiong, Heidi Christensen, Jon Barker, 
Exploring Appropriate Acoustic and Language Modelling Choices for Continuous Dysarthric Speech Recognition.
Interspeech2020
Lubna Alhinti, Stuart P. Cunningham, Heidi Christensen, 
Recognising Emotions in Dysarthric Speech Using Typical Speech Data.
Interspeech2020
Nicholas Cummins, Yilin Pan, Zhao Ren, Julian Fritsch, Venkata Srikanth Nallanthighal, Heidi Christensen, Daniel Blackburn, Björn W. Schuller, Mathew Magimai-Doss, Helmer Strik, Aki Härmä, 
A Comparison of Acoustic and Linguistics Methodologies for Alzheimer's Dementia Recognition.
Interspeech2020
Bahman Mirheidari, Daniel Blackburn, Ronan O'Malley, Annalena Venneri, Traci Walker, Markus Reuber, Heidi Christensen, 
Improving Cognitive Impairment Classification by Generative Neural Network-Based Feature Augmentation.
Interspeech2020
Yilin Pan, Bahman Mirheidari, Markus Reuber, Annalena Venneri, Daniel Blackburn, Heidi Christensen, 
Improving Detection of Alzheimer's Disease Using Automatic Speech Recognition to Identify High-Quality Segments for More Robust Feature Extraction.
Interspeech2020
Yilin Pan, Bahman Mirheidari, Zehai Tu, Ronan O'Malley, Traci Walker, Annalena Venneri, Markus Reuber, Daniel Blackburn, Heidi Christensen, 
Acoustic Feature Extraction with Interpretable Deep Neural Network for Neurodegenerative Related Disorder Classification.
Interspeech2020
Zhengjun Yue, Heidi Christensen, Jon Barker, 
Autoencoder Bottleneck Features with Multi-Task Optimisation for Improved Continuous Dysarthric Speech Recognition.
ICASSP2019
Bahman Mirheidari, Daniel Blackburn, Ronan O'Malley, Traci Walker, Annalena Venneri, Markus Reuber, Heidi Christensen, 
Computational Cognitive Assessment: Investigating the Use of an Intelligent Virtual Agent for the Detection of Early Signs of Dementia.
ICASSP2022
Aswin Sivaraman, Scott Wisdom, Hakan Erdogan, John R. Hershey, 
Adapting Speech Separation to Real-World Meetings using Mixture Invariant Training.
Interspeech2022
Hannah Muckenhirn, Aleksandr Safin, Hakan Erdogan, Felix de Chaumont Quitry, Marco Tagliasacchi, Scott Wisdom, John R. Hershey, 
CycleGAN-based Unpaired Speech Dereverberation.
Interspeech2022
Katharine Patterson, Kevin W. Wilson, Scott Wisdom, John R. Hershey, 
Distance-Based Sound Separation.
ICASSP2021
Soumi Maiti, Hakan Erdogan, Kevin W. Wilson, Scott Wisdom, Shinji Watanabe 0001, John R. Hershey, 
End-To-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker Embeddings.
Interspeech2021
Cong Han, Yi Luo 0004, Chenda Li, Tianyan Zhou, Keisuke Kinoshita, Shinji Watanabe 0001, Marc Delcroix, Hakan Erdogan, John R. Hershey, Nima Mesgarani, Zhuo Chen 0006, 
Continuous Speech Separation Using Speaker Inventory for Long Recording.
ICLR2021
Efthymios Tzinis, Scott Wisdom, Aren Jansen, Shawn Hershey, Tal Remez, Dan Ellis, John R. Hershey, 
Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen Sounds.
ICASSP2020
Efthymios Tzinis, Scott Wisdom, John R. Hershey, Aren Jansen, Daniel P. W. Ellis, 
Improving Universal Sound Separation Using Sound Classification.
ICASSP2019
Jonathan Le Roux, Scott Wisdom, Hakan Erdogan, John R. Hershey, 
SDR - Half-baked or Well Done?
ICASSP2019
Scott Wisdom, John R. Hershey, Kevin W. Wilson, Jeremy Thorpe, Michael Chinen, Brian Patton, Rif A. Saurous, 
Differentiable Consistency Constraints for Improved Deep Speech Enhancement.
Interspeech2019
Hiroshi Seki, Takaaki Hori, Shinji Watanabe 0001, Jonathan Le Roux, John R. Hershey, 
End-to-End Multilingual Multi-Speaker Speech Recognition.
Interspeech2019
Quan Wang, Hannah Muckenhirn, Kevin W. Wilson, Prashant Sridhar, Zelin Wu, John R. Hershey, Rif A. Saurous, Ron J. Weiss, Ye Jia, Ignacio Lopez-Moreno, 
VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking.
ICASSP2018
Tsubasa Ochiai, Shinji Watanabe 0001, Shigeru Katagiri, Takaaki Hori, John R. Hershey, 
Speaker Adaptation for Multichannel End-to-End Speech Recognition.
ICASSP2018
Hiroshi Seki, Shinji Watanabe 0001, Takaaki Hori, Jonathan Le Roux, John R. Hershey, 
An End-to-End Language-Tracking Speech Recognizer for Mixed-Language Speech.
ICASSP2018
Shane Settle, Jonathan Le Roux, Takaaki Hori, Shinji Watanabe 0001, John R. Hershey, 
End-to-End Multi-Speaker Speech Recognition.
ICASSP2018
Zhong-Qiu Wang, Jonathan Le Roux, John R. Hershey, 
Multi-Channel Deep Clustering: Discriminative Spectral and Spatial Embeddings for Speaker-Independent Speech Separation.
ICASSP2018
Zhong-Qiu Wang, Jonathan Le Roux, John R. Hershey, 
Alternative Objective Functions for Deep Clustering.
Interspeech2018
Zhong-Qiu Wang, Jonathan Le Roux, DeLiang Wang, John R. Hershey, 
End-to-End Speech Separation with Unfolded Iterative Phase Reconstruction.
ACL2018
Hiroshi Seki, Takaaki Hori, Shinji Watanabe 0001, Jonathan Le Roux, John R. Hershey, 
A Purely End-to-End System for Multi-speaker Speech Recognition.
ICASSP2017
Yi Luo 0004, Zhuo Chen 0006, John R. Hershey, Jonathan Le Roux, Nima Mesgarani, 
Deep clustering and conventional networks for music separation: Stronger together.
ICASSP2017
Shinji Watanabe 0001, Takaaki Hori, Jonathan Le Roux, John R. Hershey, 
Student-teacher network learning with enhanced features.
ICASSP2022
Zehua Chen, Xu Tan 0003, Ke Wang, Shifeng Pan, Danilo P. Mandic, Lei He 0005, Sheng Zhao, 
Infergrad: Improving Diffusion Models for Vocoder by Considering Inference in Training.
ICASSP2022
Liyang Chen, Zhiyong Wu 0001, Jun Ling, Runnan Li, Xu Tan 0003, Sheng Zhao, 
Transformer-S2A: Robust and Efficient Speech-to-Animation.
ICASSP2022
Guangyan Zhang, Yichong Leng, Daxin Tan, Ying Qin, Kaitao Song, Xu Tan 0003, Sheng Zhao, Tan Lee, 
A Study on the Efficacy of Model Pre-Training In Developing Neural Text-to-Speech System.
Interspeech2022
Yanqing Liu, Ruiqing Xue, Lei He 0005, Xu Tan 0003, Sheng Zhao, 
DelightfulTTS 2: End-to-End Speech Synthesis with Adversarial Vector-Quantized Auto-Encoders.
Interspeech2022
Yihan Wu, Xu Tan 0003, Bohan Li 0003, Lei He 0005, Sheng Zhao, Ruihua Song, Tao Qin, Tie-Yan Liu, 
AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios.
Interspeech2022
Guangyan Zhang, Kaitao Song, Xu Tan 0003, Daxin Tan, Yuzi Yan, Yanqing Liu, Gang Wang, Wei Zhou, Tao Qin, Tan Lee, Sheng Zhao, 
Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech.
ACL2022
Yi Ren 0006, Xu Tan 0003, Tao Qin, Zhou Zhao, Tie-Yan Liu, 
Revisiting Over-Smoothness in Text to Speech.
ICASSP2021
Yichong Leng, Xu Tan 0003, Sheng Zhao, Frank K. Soong, Xiang-Yang Li 0001, Tao Qin, 
MBNET: MOS Prediction for Synthesized Speech with Mean-Bias Network.
ICASSP2021
Renqian Luo, Xu Tan 0003, Rui Wang, Tao Qin, Jinzhu Li, Sheng Zhao, Enhong Chen, Tie-Yan Liu, 
Lightspeech: Lightweight and Fast Text to Speech with Neural Architecture Search.
ICASSP2021
Linghui Meng 0001, Jin Xu 0010, Xu Tan 0003, Jindong Wang 0001, Tao Qin, Bo Xu 0002, 
MixSpeech: Data Augmentation for Low-Resource Automatic Speech Recognition.
ICASSP2021
Yuzi Yan, Xu Tan 0003, Bohan Li 0003, Tao Qin, Sheng Zhao, Yuan Shen 0001, Tie-Yan Liu, 
Adaspeech 2: Adaptive Text to Speech with Untranscribed Data.
ICASSP2021
Chen Zhang 0020, Yi Ren 0006, Xu Tan 0003, Jinglin Liu, Kejun Zhang, Tao Qin, Sheng Zhao, Tie-Yan Liu, 
Denoispeech: Denoising Text to Speech with Frame-Level Noise Modeling.
Interspeech2021
Wenxin Hou, Jindong Wang 0001, Xu Tan 0003, Tao Qin, Takahiro Shinozaki, 
Cross-Domain Speech Recognition with Unsupervised Character-Level Distribution Matching.
Interspeech2021
Yuzi Yan, Xu Tan 0003, Bohan Li 0003, Guangyan Zhang, Tao Qin, Sheng Zhao, Yuan Shen 0001, Wei-Qiang Zhang, Tie-Yan Liu, 
Adaptive Text to Speech for Spontaneous Style.
NeurIPS2021
Jiawei Chen 0008, Xu Tan 0003, Yichong Leng, Jin Xu 0010, Guihua Wen, Tao Qin, Tie-Yan Liu, 
Speech-T: Transducer for Text to Speech and Beyond.
NeurIPS2021
Yichong Leng, Xu Tan 0003, Linchen Zhu, Jin Xu 0010, Renqian Luo, Linquan Liu, Tao Qin, Xiangyang Li 0001, Edward Lin, Tie-Yan Liu, 
FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition.
ICLR2021
Yi Ren 0006, Chenxu Hu, Xu Tan 0003, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu, 
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.
ICLR2021
Mingjian Chen, Xu Tan 0003, Bohan Li 0003, Yanqing Liu, Tao Qin, Sheng Zhao, Tie-Yan Liu, 
AdaSpeech: Adaptive Text to Speech for Custom Voice.
AAAI2021
Chen Zhang 0020, Xu Tan 0003, Yi Ren 0006, Tao Qin, Kejun Zhang, Tie-Yan Liu, 
UWSpeech: Speech to Speech Translation for Unwritten Languages.
Interspeech2020
Mingjian Chen, Xu Tan 0003, Yi Ren 0006, Jin Xu 0010, Hao Sun, Sheng Zhao, Tao Qin, 
MultiSpeech: Multi-Speaker Text to Speech with Transformer.
ICASSP2022
Xiuyi Chen, Feilong Chen, Shuang Xu, Bo Xu 0002, 
A Multi Domain Knowledge Enhanced Matching Network for Response Selection in Retrieval-Based Dialogue Systems.
ICASSP2022
Feilong Chen, Xiuyi Chen, Shuang Xu, Bo Xu 0002, 
Improving Cross-Modal Understanding in Visual Dialog Via Contrastive Learning.
ICASSP2022
Minglun Han, Linhao Dong, Zhenlin Liang, Meng Cai, Shiyu Zhou, Zejun Ma, Bo Xu 0002, 
Improving End-to-End Contextual Speech Recognition with Fine-Grained Contextual Knowledge Selection.
ICASSP2021
Yunzhe Hao, Jiaming Xu 0001, Peng Zhang, Bo Xu 0002, 
Wase: Learning When to Attend for Speaker Extraction in Cocktail Party Environments.
ICASSP2021
Chenxing Li, Jiaming Xu 0001, Nima Mesgarani, Bo Xu 0002, 
Speaker and Direction Inferred Dual-Channel Speech Separation.
ICASSP2021
Linghui Meng 0001, Jin Xu 0010, Xu Tan 0003, Jindong Wang 0001, Tao Qin, Bo Xu 0002, 
MixSpeech: Data Augmentation for Low-Resource Automatic Speech Recognition.
Interspeech2021
Zhiyun Fan, Meng Li, Shiyu Zhou, Bo Xu 0002, 
Exploring wav2vec 2.0 on Speaker Verification and Language Identification.
Interspeech2021
Xiyun Li, Yong Xu 0004, Meng Yu 0003, Shi-Xiong Zhang, Jiaming Xu 0001, Bo Xu 0002, Dong Yu 0001, 
MIMO Self-Attentive RNN Beamformer for Multi-Speaker Speech Separation.
AAAI2021
Qianqian Dong, Mingxuan Wang, Hao Zhou 0012, Shuang Xu, Bo Xu 0002, Lei Li 0005, 
Consecutive Decoding for Speech-to-text Translation.
AAAI2021
Qianqian Dong, Rong Ye, Mingxuan Wang, Hao Zhou 0012, Shuang Xu, Bo Xu 0002, Lei Li 0005, 
Listen, Understand and Translate: Triple Supervision Decouples End-to-end Speech-to-text Translation.
ICASSP2020
Linhao Dong, Bo Xu 0002, 
CIF: Continuous Integrate-And-Fire for End-To-End Speech Recognition.
Interspeech2020
Jing Shi 0003, Jiaming Xu 0001, Yusuke Fujita, Shinji Watanabe 0001, Bo Xu 0002, 
Speaker-Conditional Chain Model for Speech Separation and Extraction.
Interspeech2020
Yunzhe Hao, Jiaming Xu 0001, Jing Shi 0003, Peng Zhang, Lei Qin, Bo Xu 0002, 
A Unified Framework for Low-Latency Speaker Extraction in Cocktail Party Environments.
ICASSP2019
Linhao Dong, Feng Wang 0023, Bo Xu 0002, 
Self-attention Aligner: A Latency-control End-to-end Model for ASR Using Self-attention Network and Chunk-hopping.
Interspeech2019
Yuxiang Zou, Linhao Dong, Bo Xu 0002, 
Boosting Character-Based Chinese Speech Synthesis via Multi-Task Learning and Dictionary Tutoring.
ICASSP2018
Linhao Dong, Shuang Xu, Bo Xu 0002, 
Speech-Transformer: A No-Recurrence Sequence-to-Sequence Model for Speech Recognition.
Interspeech2018
Linhao Dong, Shiyu Zhou, Wei Chen 0048, Bo Xu 0002, 
Extending Recurrent Neural Aligner for Streaming End-to-End Speech Recognition in Mandarin.
Interspeech2018
Ruifang Ji, Xinyuan Cai, Bo Xu 0002, 
An End-to-End Text-Independent Speaker Identification System on Short Utterances.
Interspeech2018
Chenxing Li, Tieqiang Wang, Shuang Xu, Bo Xu 0002, 
Single-channel Speech Dereverberation via Generative Adversarial Training.
Interspeech2018
Shiyu Zhou, Linhao Dong, Shuang Xu, Bo Xu 0002, 
Syllable-Based Sequence-to-Sequence Speech Recognition with the Transformer in Mandarin Chinese.
ICASSP2022
Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Naoyuki Kamo, Takafumi Moriya, 
Learning to Enhance or Not: Neural Network-Based Switching of Enhanced and Observed Signals for Overlapping Speech Recognition.
Interspeech2022
Marc Delcroix, Keisuke Kinoshita, Tsubasa Ochiai, Katerina Zmolíková, Hiroshi Sato, Tomohiro Nakatani, 
Listen only to me! How well can target speech extraction handle false alarms?
Interspeech2022
Kazuma Iwamoto, Tsubasa Ochiai, Marc Delcroix, Rintaro Ikeshita, Hiroshi Sato, Shoko Araki, Shigeru Katagiri, 
How bad are artifacts?: Analyzing the impact of speech enhancement errors on ASR.
Interspeech2022
Martin Kocour, Katerina Zmolíková, Lucas Ondel, Jan Svec, Marc Delcroix, Tsubasa Ochiai, Lukás Burget, Jan Cernocký, 
Revisiting joint decoding based multi-talker speech recognition with DNN acoustic model.
Interspeech2022
Takafumi Moriya, Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Takahiro Shinozaki, 
Streaming Target-Speaker ASR with Neural Transducer.
Interspeech2022
Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Takafumi Moriya, Naoki Makishima, Mana Ihori, Tomohiro Tanaka, Ryo Masumura, 
Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations.
ICASSP2021
Christoph Böddeker, Wangyou Zhang, Tomohiro Nakatani, Keisuke Kinoshita, Tsubasa Ochiai, Marc Delcroix, Naoyuki Kamo, Yanmin Qian, Reinhold Haeb-Umbach, 
Convolutive Transfer Function Invariant SDR Training Criteria for Multi-Channel Reverberant Speech Separation.
ICASSP2021
Marc Delcroix, Katerina Zmolíková, Tsubasa Ochiai, Keisuke Kinoshita, Tomohiro Nakatani, 
Speaker Activity Driven Neural Speech Extraction.
ICASSP2021
Takafumi Moriya, Takanori Ashihara, Tomohiro Tanaka, Tsubasa Ochiai, Hiroshi Sato, Atsushi Ando, Yusuke Ijima, Ryo Masumura, Yusuke Shinohara, 
Simpleflat: A Simple Whole-Network Pre-Training Approach for RNN Transducer-Based End-to-End Speech Recognition.
ICASSP2021
Julio Wissing, Benedikt T. Boenninghoff, Dorothea Kolossa, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Shoko Araki, Christopher Schymura, 
Data Fusion for Audiovisual Speaker Localization: Extending Dynamic Stream Weights to the Spatial Domain.
Interspeech2021
Marc Delcroix, Jorge Bennasar Vázquez, Tsubasa Ochiai, Keisuke Kinoshita, Shoko Araki, 
Few-Shot Learning of New Sound Classes for Target Sound Extraction.
Interspeech2021
Takafumi Moriya, Tomohiro Tanaka, Takanori Ashihara, Tsubasa Ochiai, Hiroshi Sato, Atsushi Ando, Ryo Masumura, Marc Delcroix, Taichi Asami, 
Streaming End-to-End Speech Recognition for Hybrid RNN-T/Attention Architecture.
Interspeech2021
Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Takafumi Moriya, Naoyuki Kamo, 
Should We Always Separate?: Switching Between Enhanced and Observed Signals for Overlapping Speech Recognition.
Interspeech2021
Christopher Schymura, Benedikt T. Bönninghoff, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Shoko Araki, Dorothea Kolossa, 
PILOT: Introducing Transformers for Probabilistic Sound Event Localization.
ICASSP2020
Marc Delcroix, Tsubasa Ochiai, Katerina Zmolíková, Keisuke Kinoshita, Naohiro Tawara, Tomohiro Nakatani, Shoko Araki, 
Improving Speaker Discrimination of Target Speech Extraction With Time-Domain Speakerbeam.
ICASSP2020
Keisuke Kinoshita, Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani, 
Improving Noise Robust Automatic Speech Recognition with Single-Channel Time-Domain Enhancement Network.
ICASSP2020
Christopher Schymura, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Shoko Araki, Dorothea Kolossa, 
A Dynamic Stream Weight Backprop Kalman Filter for Audiovisual Speaker Tracking.
Interspeech2020
Takafumi Moriya, Tsubasa Ochiai, Shigeki Karita, Hiroshi Sato, Tomohiro Tanaka, Takanori Ashihara, Ryo Masumura, Yusuke Shinohara, Marc Delcroix, 
Self-Distillation for Improving CTC-Transformer-Based ASR Systems.
Interspeech2020
Tsubasa Ochiai, Marc Delcroix, Yuma Koizumi, Hiroaki Ito, Keisuke Kinoshita, Shoko Araki, 
Listen to What You Want: Neural Network-Based Universal Sound Selector.
ICASSP2019
Marc Delcroix, Katerina Zmolíková, Tsubasa Ochiai, Keisuke Kinoshita, Shoko Araki, Tomohiro Nakatani, 
Compact Network for Speakerbeam Target Speaker Extraction.
Interspeech2021
Huyen Nguyen, Ralph Vente, David Lupea, Sarah Ita Levitan, Julia Hirschberg, 
Acoustic-Prosodic, Lexical and Demographic Cues to Persuasiveness in Competitive Debate Speeches.
SpeechComm2020
Andreas Weise, Sarah Ita Levitan, Julia Hirschberg, Rivka Levitan, 
Individual differences in acoustic-prosodic entrainment in spoken dialogue.
SpeechComm2020
Ramiro H. Gálvez, Agustín Gravano, Stefan Benus, Rivka Levitan, Marián Trnka, Julia Hirschberg, 
An empirical study of the effect of acoustic-prosodic entrainment on the perceived trustworthiness of conversational avatars.
Interspeech2020
Jiaxuan Zhang, Sarah Ita Levitan, Julia Hirschberg, 
Multimodal Deception Detection Using Automatically Extracted Acoustic, Visual, and Lexical Features.
Interspeech2019
Alice Baird, Eduardo Coutinho, Julia Hirschberg, Björn W. Schuller, 
Sincerity in Acted Speech: Presenting the Sincere Apology Corpus and Results.
Interspeech2019
Victor Soto, Julia Hirschberg, 
Improving Code-Switched Language Modeling Performance Using Cognate Features.
Interspeech2019
Zixiaofan Yang, Julia Hirschberg, 
Linguistically-Informed Training of Acoustic Word Embeddings for Low-Resource Languages.
Interspeech2019
Zixiaofan Yang, Bingyan Hu, Julia Hirschberg, 
Predicting Humor by Learning from Time-Aligned Comments.
Interspeech2018
Guozhen An, Sarah Ita Levitan, Julia Hirschberg, Rivka Levitan, 
Deep Personality Recognition for Deception Detection.
Interspeech2018
Kai-Zhan Lee, Erica Cooper, Julia Hirschberg, 
A Comparison of Speaker-based and Utterance-based Data Selection for Text-to-Speech Synthesis.
Interspeech2018
Sarah Ita Levitan, Angel Maredia, Julia Hirschberg, 
Acoustic-Prosodic Indicators of Deception and Trust in Interview Dialogues.
Interspeech2018
Victor Soto, Nishmar Cestero, Julia Hirschberg, 
The Role of Cognate Words, POS Tags and Entrainment in Code-Switching.
Interspeech2018
Zixiaofan Yang, Julia Hirschberg, 
Predicting Arousal and Valence from Waveforms and Spectrograms Using Deep Neural Networks.
Interspeech2017
Erica Cooper, Xinyue Wang, Alison Chang, Yocheved Levitan, Julia Hirschberg, 
Utterance Selection for Optimizing Intelligibility of TTS Voices Trained on ASR Data.
Interspeech2017
Gideon Mendels, Sarah Ita Levitan, Kai-Zhan Lee, Julia Hirschberg, 
Hybrid Acoustic-Lexical Deep Learning Approach for Deception Detection.
Interspeech2017
Victor Soto, Julia Hirschberg, 
Crowdsourcing Universal Part-of-Speech Tags for Code-Switching.
Interspeech2016
Guozhen An, Sarah Ita Levitan, Rivka Levitan, Andrew Rosenberg, Michelle Levine, Julia Hirschberg, 
Automatically Classifying Self-Rated Personality Scores from Speech.
Interspeech2016
Erica Cooper, Alison Chang, Yocheved Levitan, Julia Hirschberg, 
Data Selection and Adaptation for Naturalness in HMM-Based Speech Synthesis.
Interspeech2016
Mona T. Diab, Pascale Fung, Julia Hirschberg, Thamar Solorio, 
Computational Approaches to Linguistic Code Switching.
Interspeech2016
Sarah Ita Levitan, Guozhen An, Min Ma, Rivka Levitan, Andrew Rosenberg, Julia Hirschberg, 
Combining Acoustic-Prosodic, Lexical, and Phonotactic Features for Automatic Deception Detection.
ICASSP2022
Lies Bollens, Tom Francart, Hugo Van hamme, 
Learning Subject-Invariant Representations from Speech-Evoked EEG Using Variational Autoencoders.
Interspeech2022
Quentin Meeus, Marie-Francine Moens, Hugo Van hamme, 
Multitask Learning for Low Resource Spoken Language Understanding.
Interspeech2022
Corentin Puffay, Jana Van Canneyt, Jonas Vanthornhout, Hugo Van hamme, Tom Francart, 
Relating the fundamental frequency of speech with EEG using a dilated convolutional network.
Interspeech2022
Bastiaan Tamm, Helena Balabin, Rik Vandenberghe, Hugo Van hamme, 
Pre-trained Speech Representations as Feature Extractors for Speech Quality Assessment in Online Conferencing Applications.
Interspeech2022
Pu Wang, Hugo Van hamme, 
Bottleneck Low-rank Transformers for Low-resource Spoken Language Understanding.
Interspeech2021
Wim Boes, Hugo Van hamme, 
Audiovisual Transfer Learning for Audio Tagging and Sound Event Detection.
Interspeech2021
Mohammad Jalilpour-Monesi, Bernd Accou, Tom Francart, Hugo Van hamme, 
Extracting Different Levels of Speech Information from EEG Using an LSTM-Based Model.
Interspeech2021
Jinzi Qi, Hugo Van hamme, 
Speech Disorder Classification Using Extended Factorized Hierarchical Variational Auto-Encoders.
Interspeech2021
Pu Wang, Bagher BabaAli, Hugo Van hamme, 
A Study into Pre-Training Strategies for Spoken Language Understanding on Dysarthric Speech.
ICASSP2020
Mohammad Jalilpour-Monesi, Bernd Accou, Jair Montoya-Martínez, Tom Francart, Hugo Van hamme, 
An LSTM Based Architecture to Relate Speech Stimulus to Eeg.
ICASSP2020
Jakob Poncelet, Hugo Van hamme, 
Multitask Learning with Capsule Networks for Speech-to-Intent Applications.
Interspeech2019
Pieter Appeltans, Jeroen Zegers, Hugo Van hamme, 
Practical Applicability of Deep Neural Networks for Overlapping Speaker Separation.
Interspeech2019
Jeroen Zegers, Hugo Van hamme, 
CNN-LSTM Models for Multi-Speaker Source Separation Using Bayesian Hyper Parameter Optimization.
ICASSP2018
Jeroen Zegers, Hugo Van hamme, 
Multi-Scenario Deep Learning for Multi-Speaker Source Separation.
Interspeech2018
Vincent Renkens, Hugo Van hamme, 
Capsule Networks for Low Resource Spoken Language Understanding.
Interspeech2018
Lyan Verwimp, Hugo Van hamme, Vincent Renkens, Patrick Wambacq, 
State Gradients for RNN Memory Analysis.
Interspeech2018
Jeroen Zegers, Hugo Van hamme, 
Memory Time Span in LSTMs for Multi-Speaker Source Separation.
TASLP2017
Deepak Baby, Hugo Van hamme, 
Joint Denoising and Dereverberation Using Exemplar-Based Sparse Representations and Decaying Norm Constraint.
TASLP2017
Vincent Renkens, Hugo Van hamme, 
Weakly Supervised Learning of Hidden Markov Models for Spoken Language Acquisition.
Interspeech2017
Jeroen Zegers, Hugo Van hamme, 
Improving Source Separation via Multi-Speaker Representations.
SpeechComm2023
Premjeet Singh, Md. Sahidullah, Goutam Saha 0001, 
Modulation spectral features for speech emotion recognition using deep neural networks.
TASLP2022
Brij Mohan Lal Srivastava, Mohamed Maouche, Md. Sahidullah, Emmanuel Vincent 0001, Aurélien Bellet, Marc Tommasi, Natalia A. Tomashenko, Xin Wang 0037, Junichi Yamagishi, 
Privacy and Utility of X-Vector Based Speaker Anonymization.
ICASSP2022
Xuechen Liu, Md. Sahidullah, Tomi Kinnunen, 
Learnable Nonlinear Compression for Robust Speaker Verification.
Interspeech2021
Bhusan Chettri, Rosa González Hautamäki, Md. Sahidullah, Tomi Kinnunen, 
Data Quality as Predictor of Voice Anti-Spoofing Generalization.
Interspeech2021
Raphaël Duroselle, Md. Sahidullah, Denis Jouvet, Irina Illina, 
Modeling and Training Strategies for Language Recognition Systems.
Interspeech2021
Raphaël Duroselle, Md. Sahidullah, Denis Jouvet, Irina Illina, 
Language Recognition on Unknown Conditions: The LORIA-Inria-MULTISPEECH System for AP20-OLR Challenge.
Interspeech2021
Tomi Kinnunen, Andreas Nautsch, Md. Sahidullah, Nicholas W. D. Evans, Xin Wang 0037, Massimiliano Todisco, Héctor Delgado, Junichi Yamagishi, Kong Aik Lee, 
Visualizing Classifier Adjacency Relations: A Case Study in Speaker Verification and Voice Anti-Spoofing.
TASLP2020
Tomi Kinnunen, Héctor Delgado, Nicholas W. D. Evans, Kong Aik Lee, Ville Vestman, Andreas Nautsch, Massimiliano Todisco, Xin Wang 0037, Md. Sahidullah, Junichi Yamagishi, Douglas A. Reynolds, 
Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification: Fundamentals.
ICASSP2020
Brij Mohan Lal Srivastava, Nathalie Vauquier, Md. Sahidullah, Aurélien Bellet, Marc Tommasi, Emmanuel Vincent 0001, 
Evaluating Voice Conversion-Based Privacy Protection against Informed Attackers.
Interspeech2020
Xuechen Liu, Md. Sahidullah, Tomi Kinnunen, 
A Comparative Re-Assessment of Feature Extractors for Deep Speaker Embeddings.
ICASSP2019
Tomi Kinnunen, Rosa González Hautamäki, Ville Vestman, Md. Sahidullah, 
Can We Use Speaker Recognition Technology to Attack Itself? Enhancing Mimicry Attacks Using Automatic Target Speaker Selection.
Interspeech2019
Massimiliano Todisco, Xin Wang 0037, Ville Vestman, Md. Sahidullah, Héctor Delgado, Andreas Nautsch, Junichi Yamagishi, Nicholas W. D. Evans, Tomi H. Kinnunen, Kong Aik Lee, 
ASVspoof 2019: Future Horizons in Spoofed and Fake Audio Detection.
SpeechComm2018
Rosa González Hautamäki, Md. Sahidullah, Ville Hautamäki, Tomi Kinnunen, 
Acoustical and perceptual study of voice disguise by age modification in speaker verification.
SpeechComm2018
Ville Vestman, Dhananjaya N. Gowda, Md. Sahidullah, Paavo Alku, Tomi Kinnunen, 
Speaker recognition from whispered speech: A tutorial survey and an application of time-varying linear prediction.
Interspeech2018
Massimiliano Todisco, Héctor Delgado, Kong-Aik Lee, Md. Sahidullah, Nicholas W. D. Evans, Tomi Kinnunen, Junichi Yamagishi, 
Integrated Presentation Attack Detection and Automatic Speaker Verification: Common Features and Gaussian Back-end Fusion.
SpeechComm2017
Cemal Hanilçi, Tomi Kinnunen, Md. Sahidullah, Aleksandr Sizov, 
Spoofing detection goes noisy: An analysis of synthetic speech detection in the presence of additive noise.
ICASSP2017
Anssi Kanervisto, Ville Vestman, Md. Sahidullah, Ville Hautamäki, Tomi Kinnunen, 
Effects of gender information in text-independent and text-dependent speaker verification.
ICASSP2017
Tomi Kinnunen, Md. Sahidullah, Mauro Falcone, Luca Costantini, Rosa González Hautamäki, Dennis Alexander Lehmann Thomsen, Achintya Kumar Sarkar, Zheng-Hua Tan, Héctor Delgado, Massimiliano Todisco, Nicholas W. D. Evans, Ville Hautamäki, Kong-Aik Lee, 
RedDots replayed: A new replay spoofing attack corpus for text-dependent speaker verification research.
ICASSP2017
Dipjyoti Paul, Md. Sahidullah, Goutam Saha 0001, 
Generalization of spoofing countermeasures: A case study with ASVspoof 2015 and BTAS 2016 corpora.
Interspeech2017
Tomi Kinnunen, Md. Sahidullah, Héctor Delgado, Massimiliano Todisco, Nicholas W. D. Evans, Junichi Yamagishi, Kong-Aik Lee, 
The ASVspoof 2017 Challenge: Assessing the Limits of Replay Spoofing Attack Detection.
ICASSP2022
Naoyuki Kanda, Xiong Xiao, Yashesh Gaur, Xiaofei Wang 0009, Zhong Meng, Zhuo Chen 0006, Takuya Yoshioka, 
Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers Using End-to-End Speaker-Attributed ASR.
Interspeech2022
Naoyuki Kanda, Jian Wu 0027, Yu Wu 0012, Xiong Xiao, Zhong Meng, Xiaofei Wang 0009, Yashesh Gaur, Zhuo Chen 0006, Jinyu Li 0001, Takuya Yoshioka, 
Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings.
Interspeech2022
Naoyuki Kanda, Jian Wu 0027, Yu Wu 0012, Xiong Xiao, Zhong Meng, Xiaofei Wang 0009, Yashesh Gaur, Zhuo Chen 0006, Jinyu Li 0001, Takuya Yoshioka, 
Streaming Multi-Talker ASR with Token-Level Serialized Output Training.
Interspeech2022
Zhong Meng, Yashesh Gaur, Naoyuki Kanda, Jinyu Li 0001, Xie Chen 0001, Yu Wu 0012, Yifan Gong 0001, 
Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition.
Interspeech2022
Xiaofei Wang 0009, Dongmei Wang, Naoyuki Kanda, Sefik Emre Eskimez, Takuya Yoshioka, 
Leveraging Real Conversational Data for Multi-Channel Continuous Speech Separation.
Interspeech2022
Wangyou Zhang, Zhuo Chen 0006, Naoyuki Kanda, Shujie Liu 0001, Jinyu Li 0001, Sefik Emre Eskimez, Takuya Yoshioka, Xiong Xiao, Zhong Meng, Yanmin Qian, Furu Wei, 
Separating Long-Form Speech with Group-wise Permutation Invariant Training.
ICASSP2021
Naoyuki Kanda, Zhong Meng, Liang Lu 0001, Yashesh Gaur, Xiaofei Wang 0009, Zhuo Chen 0006, Takuya Yoshioka, 
Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR.
ICASSP2021
Zhong Meng, Naoyuki Kanda, Yashesh Gaur, Sarangarajan Parthasarathy, Eric Sun, Liang Lu 0001, Xie Chen 0001, Jinyu Li 0001, Yifan Gong 0001, 
Internal Language Model Training for Domain-Adaptive End-To-End Speech Recognition.
ICASSP2021
Yao Qian, Ximo Bian, Yu Shi 0001, Naoyuki Kanda, Leo Shen, Zhen Xiao, Michael Zeng 0001, 
Speech-Language Pre-Training for End-to-End Spoken Language Understanding.
ICASSP2021
Xiong Xiao, Naoyuki Kanda, Zhuo Chen 0006, Tianyan Zhou, Takuya Yoshioka, Sanyuan Chen, Yong Zhao 0008, Gang Liu, Yu Wu 0012, Jian Wu 0027, Shujie Liu 0001, Jinyu Li 0001, Yifan Gong 0001, 
Microsoft Speaker Diarization System for the Voxceleb Speaker Recognition Challenge 2020.
Interspeech2021
Liang Lu 0001, Naoyuki Kanda, Jinyu Li 0001, Yifan Gong 0001, 
Streaming Multi-Talker Speech Recognition with Joint Speaker Identification.
Interspeech2021
Liang Lu 0001, Zhong Meng, Naoyuki Kanda, Jinyu Li 0001, Yifan Gong 0001, 
On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer.
Interspeech2021
Naoyuki Kanda, Guoli Ye, Yashesh Gaur, Xiaofei Wang 0009, Zhong Meng, Zhuo Chen 0006, Takuya Yoshioka, 
End-to-End Speaker-Attributed ASR with Transformer.
Interspeech2021
Naoyuki Kanda, Guoli Ye, Yu Wu 0012, Yashesh Gaur, Xiaofei Wang 0009, Zhong Meng, Zhuo Chen 0006, Takuya Yoshioka, 
Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone.
Interspeech2021
Zhong Meng, Yu Wu 0012, Naoyuki Kanda, Liang Lu 0001, Xie Chen 0001, Guoli Ye, Eric Sun, Jinyu Li 0001, Yifan Gong 0001, 
Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition.
Interspeech2021
Jian Wu 0027, Zhuo Chen 0006, Sanyuan Chen, Yu Wu 0012, Takuya Yoshioka, Naoyuki Kanda, Shujie Liu 0001, Jinyu Li 0001, 
Investigation of Practical Aspects of Single Channel Speech Separation for ASR.
Interspeech2020
Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang 0009, Zhong Meng, Zhuo Chen 0006, Tianyan Zhou, Takuya Yoshioka, 
Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of any Number of Speakers.
Interspeech2020
Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang 0009, Zhong Meng, Takuya Yoshioka, 
Serialized Output Training for End-to-End Overlapped Speech Recognition.
ICASSP2019
Naoyuki Kanda, Yusuke Fujita, Shota Horiguchi, Rintaro Ikeshita, Kenji Nagamatsu, Shinji Watanabe 0001, 
Acoustic Modeling for Distant Multi-talker Speech Recognition with Single- and Multi-channel Branches.
Interspeech2019
Yusuke Fujita, Naoyuki Kanda, Shota Horiguchi, Kenji Nagamatsu, Shinji Watanabe 0001, 
End-to-End Neural Speaker Diarization with Permutation-Free Objectives.
ICASSP2022
Tianchi Liu 0004, Rohan Kumar Das, Kong Aik Lee, Haizhou Li 0001, 
MFA: TDNN with Multi-Scale Frequency-Channel Attention for Text-Independent Speaker Verification with Short Utterances.
ICASSP2022
Ruijie Tao, Kong Aik Lee, Rohan Kumar Das, Ville Hautamäki, Haizhou Li 0001, 
Self-Supervised Speaker Recognition with Loss-Gated Learning.
TASLP2021
Jichen Yang, Hongji Wang, Rohan Kumar Das, Yanmin Qian, 
Modified Magnitude-Phase Spectrum Information for Spoofing Detection.
ICASSP2021
Rohan Kumar Das, Jichen Yang, Haizhou Li 0001, 
Data Augmentation with Signal Companding for Detection of Logical Access Attacks.
Interspeech2021
Rohan Kumar Das, Maulik C. Madhavi, Haizhou Li 0001, 
Diagnosis of COVID-19 Using Auditory Acoustic Cues.
ICASSP2020
Rohan Kumar Das, Haizhou Li 0001, 
On the Importance of Vocal Tract Constriction for Speaker Characterization: The Whispered Speech Study.
ICASSP2020
Rohan Kumar Das, Jichen Yang, Haizhou Li 0001, 
Assessing the Scope of Generalized Countermeasures for Anti-Spoofing.
ICASSP2020
Xuehao Zhou, Xiaohai Tian, Grandee Lee, Rohan Kumar Das, Haizhou Li 0001, 
End-to-End Code-Switching TTS with Cross-Lingual Language Model.
Interspeech2020
Tianchi Liu 0004, Rohan Kumar Das, Maulik C. Madhavi, Shengmei Shen, Haizhou Li 0001, 
Speaker-Utterance Dual Attention for Speaker and Utterance Verification.
Interspeech2020
Rohan Kumar Das, Xiaohai Tian, Tomi Kinnunen, Haizhou Li 0001, 
The Attacker's Perspective on Automatic Speaker Verification: An Overview.
Interspeech2020
Xiaoyi Qin, Ming Li 0026, Hui Bu, Wei Rao, Rohan Kumar Das, Shrikanth Narayanan, Haizhou Li 0001, 
The INTERSPEECH 2020 Far-Field Speaker Verification Challenge.
Interspeech2020
Ruijie Tao, Rohan Kumar Das, Haizhou Li 0001, 
Audio-Visual Speaker Recognition with a Cross-Modal Discriminative Network.
Interspeech2020
Zhenzong Wu, Rohan Kumar Das, Jichen Yang, Haizhou Li 0001, 
Light Convolutional Neural Network with Feature Genuinization for Detection of Synthetic Speech Attacks.
TASLP2019
Jichen Yang, Rohan Kumar Das, Nina Zhou, 
Extraction of Octave Spectra Information for Spoofing Attack Detection.
ICASSP2019
Yi Zhou 0020, Xiaohai Tian, Haihua Xu, Rohan Kumar Das, Haizhou Li 0001, 
Cross-lingual Voice Conversion with Bilingual Phonetic Posteriorgram and Average Modeling.
Interspeech2019
Rohan Kumar Das, Haizhou Li 0001, 
Instantaneous Phase and Long-Term Acoustic Cues for Orca Activity Detection.
Interspeech2019
Rohan Kumar Das, Jichen Yang, Haizhou Li 0001, 
Long Range Acoustic Features for Spoofed Speech Detection.
Interspeech2019
Sarfaraz Jelil, Abhishek Shrivastava, Rohan Kumar Das, S. R. Mahadeva Prasanna, Rohit Sinha 0003, 
SpeechMarker: A Voice Based Multi-Level Attendance Application.
Interspeech2019
Kong Aik Lee, Ville Hautamäki, Tomi H. Kinnunen, Hitoshi Yamamoto, Koji Okabe, Ville Vestman, Jing Huang 0019, Guohong Ding, Hanwu Sun, Anthony Larcher, Rohan Kumar Das, Haizhou Li 0001, Mickael Rouvier, Pierre-Michel Bousquet, Wei Rao, Qing Wang 0039, Chunlei Zhang, Fahimeh Bahmaninezhad, Héctor Delgado, Massimiliano Todisco, 
I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences.
Interspeech2019
Tianchi Liu 0004, Maulik C. Madhavi, Rohan Kumar Das, Haizhou Li 0001, 
A Unified Framework for Speaker and Utterance Verification.
ICASSP2022
Han Chen, Yan Song 0001, Li-Rong Dai 0001, Ian McLoughlin 0001, Lin Liu, 
Self-Supervised Representation Learning for Unsupervised Anomalous Sound Detection Under Domain Shift.
ICASSP2022
Hang-Rui Hu, Yan Song 0001, Ying Liu, Li-Rong Dai 0001, Ian McLoughlin 0001, Lin Liu, 
Domain Robust Deep Embedding Learning for Speaker Recognition.
ICASSP2022
Yuxuan Xi, Yan Song 0001, Li-Rong Dai 0001, Ian McLoughlin 0001, Lin Liu, 
Frontend Attributes Disentanglement for Speech Emotion Recognition.
Interspeech2022
Hang-Rui Hu, Yan Song 0001, Li-Rong Dai 0001, Ian McLoughlin 0001, Lin Liu, 
Class-Aware Distribution Alignment based Unsupervised Domain Adaptation for Speaker Verification.
TASLP2021
Jian Tang, Jie Zhang 0042, Yan Song 0001, Ian McLoughlin 0001, Li-Rong Dai 0001, 
Multi-Granularity Sequence Alignment Mapping for Encoder-Decoder Based End-to-End ASR.
ICASSP2021
Ying Liu, Yan Song 0001, Ian McLoughlin 0001, Lin Liu, Li-Rong Dai 0001, 
An Effective Deep Embedding Learning Method Based on Dense-Residual Networks for Speaker Verification.
Interspeech2021
Hui Wang, Lin Liu, Yan Song 0001, Lei Fang, Ian McLoughlin 0001, Li-Rong Dai 0001, 
A Weight Moving Average Based Alternate Decoupled Learning Algorithm for Long-Tailed Language Identification.
Interspeech2021
Xu Zheng, Yan Song 0001, Li-Rong Dai 0001, Ian McLoughlin 0001, Lin Liu, 
An Effective Mutual Mean Teaching Based Domain Adaptation Method for Sound Event Detection.
ICASSP2020
Hui Wang, Yan Song 0001, Zengxi Li, Ian McLoughlin 0001, Li-Rong Dai 0001, 
An Online Speaker-aware Speech Separation Approach Based on Time-domain Representation.
ICASSP2020
Jie Yan, Yan Song 0001, Li-Rong Dai 0001, Ian McLoughlin 0001, 
Task-Aware Mean Teacher Method for Large Scale Weakly Labeled Semi-Supervised Sound Event Detection.
Interspeech2020
Ying Liu, Yan Song 0001, Yiheng Jiang, Ian McLoughlin 0001, Lin Liu, Li-Rong Dai 0001, 
An Effective Speaker Recognition Method Based on Joint Identification and Verification Supervisions.
Interspeech2020
Zi-qiang Zhang, Yan Song 0001, Jian-Shu Zhang, Ian McLoughlin 0001, Li-Rong Dai 0001, 
Semi-Supervised End-to-End ASR via Teacher-Student Learning with Conditional Posterior Distribution.
Interspeech2020
Xu Zheng, Yan Song 0001, Jie Yan, Li-Rong Dai 0001, Ian McLoughlin 0001, Lin Liu, 
An Effective Perturbation Based Semi-Supervised Learning Method for Sound Event Detection.
TASLP2019
Zengxi Li, Yan Song 0001, Li-Rong Dai 0001, Ian McLoughlin 0001, 
Listening and Grouping: An Online Autoregressive Approach for Monaural Speech Separation.
ICASSP2019
Jian Sun, Wu Guo, Zhi Chen, Yan Song 0001, 
Topic Detection in Conversational Telephone Speech Using CNN with Multi-stream Inputs.
ICASSP2019
Jie Yan, Yan Song 0001, Wu Guo, Li-Rong Dai 0001, Ian McLoughlin 0001, Liang Chen, 
A Region Based Attention Method for Weakly Supervised Sound Event Detection and Classification.
Interspeech2019
Zhifu Gao, Yan Song 0001, Ian McLoughlin 0001, Pengcheng Li, Yiheng Jiang, Li-Rong Dai 0001, 
Improving Aggregation and Loss Function for Better Embedding Learning in End-to-End Speaker Verification System.
Interspeech2019
Yiheng Jiang, Yan Song 0001, Ian McLoughlin 0001, Zhifu Gao, Li-Rong Dai 0001, 
An Effective Deep Embedding Learning Architecture for Speaker Verification.
TASLP2018
Ma Jin, Yan Song 0001, Ian McLoughlin 0001, Li-Rong Dai 0001, 
LID-Senones and Their Statistics for Language Identification.
ICASSP2018
Zengxi Li, Yan Song 0001, Li-Rong Dai 0001, Ian McLoughlin 0001, 
Source-Aware Context Network for Single-Channel Multi-Speaker Speech Separation.
ICASSP2022
Dongseong Hwang, Ananya Misra, Zhouyuan Huo, Nikhil Siddhartha, Shefali Garg, David Qiu, Khe Chai Sim, Trevor Strohman, Françoise Beaufays, Yanzhang He, 
Large-Scale ASR Domain Adaptation Using Self- and Semi-Supervised Learning.
ICASSP2022
Qiujia Li, Yu Zhang 0033, David Qiu, Yanzhang He, Liangliang Cao, Philip C. Woodland, 
Improving Confidence Estimation on Out-of-Domain Data for End-to-End Speech Recognition.
ICASSP2022
Tara N. Sainath, Yanzhang He, Arun Narayanan, Rami Botros, Weiran Wang, David Qiu, Chung-Cheng Chiu, Rohit Prabhavalkar, Alexander Gruenstein, Anmol Gulati, Bo Li 0028, David Rybach, Emmanuel Guzman, Ian McGraw, James Qin, Krzysztof Choromanski, Qiao Liang, Robert David, Ruoming Pang, Shuo-Yiin Chang, Trevor Strohman, W. Ronny Huang, Wei Han 0002, Yonghui Wu, Yu Zhang 0033, 
Improving The Latency And Quality Of Cascaded Encoders.
Interspeech2022
Shuo-Yiin Chang, Bo Li 0028, Tara N. Sainath, Chao Zhang, Trevor Strohman, Qiao Liang, Yanzhang He, 
Turn-Taking Prediction for Natural Conversational Speech.
Interspeech2022
Shaojin Ding, Phoenix Meadowlark, Yanzhang He, Lukasz Lew, Shivani Agrawal, Oleg Rybakov, 
4-bit Conformer with Native Quantization Aware Training for Speech Recognition.
Interspeech2022
Shaojin Ding, Rajeev Rikhye, Qiao Liang, Yanzhang He, Quan Wang, Arun Narayanan, Tom O'Malley, Ian McGraw, 
Personal VAD 2.0: Optimizing Personal Voice Activity Detection for On-Device Speech Recognition.
Interspeech2022
Shaojin Ding, Weiran Wang, Ding Zhao, Tara N. Sainath, Yanzhang He, Robert David, Rami Botros, Xin Wang 0016, Rina Panigrahy, Qiao Liang, Dongseong Hwang, Ian McGraw, Rohit Prabhavalkar, Trevor Strohman, 
A Unified Cascaded Encoder ASR Model for Dynamic Model Sizes.
Interspeech2022
Ke Hu, Tara N. Sainath, Yanzhang He, Rohit Prabhavalkar, Trevor Strohman, Sepand Mavandadi, Weiran Wang, 
Improving Deliberation by Text-Only and Semi-Supervised Training.
Interspeech2022
Bo Li 0028, Tara N. Sainath, Ruoming Pang, Shuo-Yiin Chang, Qiumin Xu, Trevor Strohman, Vince Chen, Qiao Liang, Heguang Liu, Yanzhang He, Parisa Haghani, Sameer Bidichandani, 
A Language Agnostic Multilingual Streaming On-Device ASR System.
Interspeech2022
Weiran Wang, Tongzhou Chen, Tara N. Sainath, Ehsan Variani, Rohit Prabhavalkar, W. Ronny Huang, Bhuvana Ramabhadran, Neeraj Gaur, Sepand Mavandadi, Cal Peyser, Trevor Strohman, Yanzhang He, David Rybach, 
Improving Rare Word Recognition with LM-aware MWER Training.
ICASSP2021
Bo Li 0028, Anmol Gulati, Jiahui Yu, Tara N. Sainath, Chung-Cheng Chiu, Arun Narayanan, Shuo-Yiin Chang, Ruoming Pang, Yanzhang He, James Qin, Wei Han 0002, Qiao Liang, Yu Zhang 0033, Trevor Strohman, Yonghui Wu, 
A Better and Faster end-to-end Model for Streaming ASR.
ICASSP2021
Qiujia Li, David Qiu, Yu Zhang 0033, Bo Li 0028, Yanzhang He, Philip C. Woodland, Liangliang Cao, Trevor Strohman, 
Confidence Estimation for Attention-Based Sequence-to-Sequence Models for Speech Recognition.
ICASSP2021
David Qiu, Qiujia Li, Yanzhang He, Yu Zhang 0033, Bo Li 0028, Liangliang Cao, Rohit Prabhavalkar, Deepti Bhatia, Wei Li 0133, Ke Hu, Tara N. Sainath, Ian McGraw, 
Learning Word-Level Confidence for Subword End-To-End ASR.
ICASSP2021
Jiahui Yu, Chung-Cheng Chiu, Bo Li 0028, Shuo-Yiin Chang, Tara N. Sainath, Yanzhang He, Arun Narayanan, Wei Han 0002, Anmol Gulati, Yonghui Wu, Ruoming Pang, 
FastEmit: Low-Latency Streaming ASR with Sequence-Level Emission Regularization.
Interspeech2021
Rami Botros, Tara N. Sainath, Robert David, Emmanuel Guzman, Wei Li 0133, Yanzhang He, 
Tied & Reduced RNN-T Decoder.
Interspeech2021
David Qiu, Yanzhang He, Qiujia Li, Yu Zhang 0033, Liangliang Cao, Ian McGraw, 
Multi-Task Learning for End-to-End ASR Word and Utterance Confidence with Deletion Prediction.
Interspeech2021
Rajeev Rikhye, Quan Wang, Qiao Liang, Yanzhang He, Ding Zhao, Yiteng Huang, Arun Narayanan, Ian McGraw, 
Personalized Keyphrase Detection Using Speaker and Environment Information.
Interspeech2021
Tara N. Sainath, Yanzhang He, Arun Narayanan, Rami Botros, Ruoming Pang, David Rybach, Cyril Allauzen, Ehsan Variani, James Qin, Quoc-Nam Le-The, Shuo-Yiin Chang, Bo Li 0028, Anmol Gulati, Jiahui Yu, Chung-Cheng Chiu, Diamantino Caseiro, Wei Li 0133, Qiao Liang, Pat Rondon, 
An Efficient Streaming Non-Recurrent On-Device End-to-End Model with Improvements to Rare-Word Modeling.
ICASSP2020
Bo Li 0028, Shuo-Yiin Chang, Tara N. Sainath, Ruoming Pang, Yanzhang He, Trevor Strohman, Yonghui Wu, 
Towards Fast and Accurate Streaming End-To-End ASR.
ICASSP2020
Tara N. Sainath, Yanzhang He, Bo Li 0028, Arun Narayanan, Ruoming Pang, Antoine Bruguier, Shuo-Yiin Chang, Wei Li 0133, Raziel Alvarez, Zhifeng Chen, Chung-Cheng Chiu, David Garcia, Alexander Gruenstein, Ke Hu, Anjuli Kannan, Qiao Liang, Ian McGraw, Cal Peyser, Rohit Prabhavalkar, Golan Pundak, David Rybach, Yuan Shangguan, Yash Sheth, Trevor Strohman, Mirkó Visontai, Yonghui Wu, Yu Zhang 0033, Ding Zhao, 
A Streaming On-Device End-To-End Model Surpassing Server-Side Conventional Model Quality and Latency.
ICASSP2022
Zhengyang Chen, Sanyuan Chen, Yu Wu 0012, Yao Qian, Chengyi Wang 0002, Shujie Liu 0001, Yanmin Qian, Michael Zeng 0001, 
Large-Scale Self-Supervised Speech Representation Learning for Automatic Speaker Verification.
ICASSP2022
Rui Wang, Junyi Ao, Long Zhou, Shujie Liu 0001, Zhihua Wei, Tom Ko, Qing Li, Yu Zhang 0006, 
Multi-View Self-Attention Based Transformer for Speaker Recognition.
ICASSP2022
Heming Wang, Yao Qian, Xiaofei Wang 0009, Yiming Wang, Chengyi Wang 0002, Shujie Liu 0001, Takuya Yoshioka, Jinyu Li 0001, DeLiang Wang, 
Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction.
ICASSP2022
Chengyi Wang 0002, Yu Wu 0012, Sanyuan Chen, Shujie Liu 0001, Jinyu Li 0001, Yao Qian, Zhenglu Yang, 
Improving Self-Supervised Learning for Speech Recognition with Intermediate Layer Supervision.
ICASSP2022
Long Zhou, Jinyu Li 0001, Eric Sun, Shujie Liu 0001, 
A Configurable Multilingual Model is All You Need to Recognize All Languages.
Interspeech2022
Junyi Ao, Ziqiang Zhang, Long Zhou, Shujie Liu 0001, Haizhou Li 0001, Tom Ko, Lirong Dai 0001, Jinyu Li 0001, Yao Qian, Furu Wei, 
Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data.
Interspeech2022
Sanyuan Chen, Yu Wu 0012, Chengyi Wang 0002, Shujie Liu 0001, Zhuo Chen 0006, Peidong Wang, Gang Liu, Jinyu Li 0001, Jian Wu 0027, Xiangzhan Yu, Furu Wei, 
Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?
Interspeech2022
Shuo Ren, Shujie Liu 0001, Yu Wu 0012, Long Zhou, Furu Wei, 
Speech Pre-training with Acoustic Piece.
Interspeech2022
Chengyi Wang 0002, Yiming Wang, Yu Wu 0012, Sanyuan Chen, Jinyu Li 0001, Shujie Liu 0001, Furu Wei, 
Supervision-Guided Codebooks for Masked Prediction in Speech Pre-training.
Interspeech2022
Wangyou Zhang, Zhuo Chen 0006, Naoyuki Kanda, Shujie Liu 0001, Jinyu Li 0001, Sefik Emre Eskimez, Takuya Yoshioka, Xiong Xiao, Zhong Meng, Yanmin Qian, Furu Wei, 
Separating Long-Form Speech with Group-wise Permutation Invariant Training.
ACL2022
Junyi Ao, Rui Wang, Long Zhou, Chengyi Wang 0002, Shuo Ren, Yu Wu 0012, Shujie Liu 0001, Tom Ko, Qing Li, Yu Zhang 0006, Zhihua Wei, Yao Qian, Jinyu Li 0001, Furu Wei, 
SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing.
ICASSP2021
Sanyuan Chen, Yu Wu 0012, Zhuo Chen 0006, Takuya Yoshioka, Shujie Liu 0001, Jin-Yu Li 0001, Xiangzhan Yu, 
Don't Shoot Butterfly with Rifles: Multi-Channel Continuous Speech Separation with Early Exit Transformer.
ICASSP2021
Xie Chen 0001, Yu Wu 0012, Zhenghao Wang, Shujie Liu 0001, Jinyu Li 0001, 
Developing Real-Time Streaming Transformer Transducer for Speech Recognition on Large-Scale Dataset.
ICASSP2021
Sanyuan Chen, Yu Wu 0012, Zhuo Chen 0006, Jian Wu 0027, Jinyu Li 0001, Takuya Yoshioka, Chengyi Wang 0002, Shujie Liu 0001, Ming Zhou 0001, 
Continuous Speech Separation with Conformer.
ICASSP2021
Xiong Xiao, Naoyuki Kanda, Zhuo Chen 0006, Tianyan Zhou, Takuya Yoshioka, Sanyuan Chen, Yong Zhao 0008, Gang Liu, Yu Wu 0012, Jian Wu 0027, Shujie Liu 0001, Jinyu Li 0001, Yifan Gong 0001, 
Microsoft Speaker Diarization System for the Voxceleb Speaker Recognition Challenge 2020.
Interspeech2021
Sanyuan Chen, Yu Wu 0012, Zhuo Chen 0006, Jian Wu 0027, Takuya Yoshioka, Shujie Liu 0001, Jinyu Li 0001, Xiangzhan Yu, 
Ultra Fast Speech Separation Model with Teacher Student Learning.
Interspeech2021
Eric Sun, Jinyu Li 0001, Zhong Meng, Yu Wu 0012, Jian Xue, Shujie Liu 0001, Yifan Gong 0001, 
Improving Multilingual Transformer Transducer Models by Reducing Language Confusions.
Interspeech2021
Jian Wu 0027, Zhuo Chen 0006, Sanyuan Chen, Yu Wu 0012, Takuya Yoshioka, Naoyuki Kanda, Shujie Liu 0001, Jinyu Li 0001, 
Investigation of Practical Aspects of Single Channel Speech Separation for ASR.
ICML2021
Chengyi Wang 0002, Yu Wu 0012, Yao Qian, Ken'ichi Kumatani, Shujie Liu 0001, Furu Wei, Michael Zeng 0001, Xuedong Huang 0001, 
UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data.
Interspeech2020
Chengyi Wang 0002, Yu Wu 0012, Yujiao Du, Jinyu Li 0001, Shujie Liu 0001, Liang Lu 0001, Shuo Ren, Guoli Ye, Sheng Zhao, Ming Zhou 0001, 
Semantic Mask for Transformer Based End-to-End Speech Recognition.
Interspeech2022
Christoph Böddeker, Tobias Cord-Landwehr, Thilo von Neumann, Reinhold Haeb-Umbach, 
An Initialization Scheme for Meeting Separation with Spatial Mixture Models.
Interspeech2022
Keisuke Kinoshita, Thilo von Neumann, Marc Delcroix, Christoph Böddeker, Reinhold Haeb-Umbach, 
Utterance-by-utterance overlap-aware neural diarization with Graph-PIT.
Interspeech2022
Michael Kuhlmann, Fritz Seebauer, Janek Ebbers, Petra Wagner, Reinhold Haeb-Umbach, 
Investigation into Target Speaking Rate Adaptation for Voice Conversion.
ICASSP2021
Christoph Böddeker, Wangyou Zhang, Tomohiro Nakatani, Keisuke Kinoshita, Tsubasa Ochiai, Marc Delcroix, Naoyuki Kamo, Yanmin Qian, Reinhold Haeb-Umbach, 
Convolutive Transfer Function Invariant SDR Training Criteria for Multi-Channel Reverberant Speech Separation.
Interspeech2021
Thilo von Neumann, Keisuke Kinoshita, Christoph Böddeker, Marc Delcroix, Reinhold Haeb-Umbach, 
Graph-PIT: Generalized Permutation Invariant Training for Continuous Separation of Arbitrary Numbers of Speakers.
TASLP2020
Tomohiro Nakatani, Christoph Böddeker, Keisuke Kinoshita, Rintaro Ikeshita, Marc Delcroix, Reinhold Haeb-Umbach, 
Jointly Optimal Denoising, Dereverberation, and Source Separation.
ICASSP2020
Jens Heitkaemper, Darius Jakobeit, Christoph Böddeker, Lukas Drude, Reinhold Haeb-Umbach, 
Demystifying TasNet: A Dissecting Approach.
ICASSP2020
Thilo von Neumann, Keisuke Kinoshita, Lukas Drude, Christoph Böddeker, Marc Delcroix, Tomohiro Nakatani, Reinhold Haeb-Umbach, 
End-to-End Training of Time Domain Audio Separation and Recognition.
Interspeech2020
Jens Heitkaemper, Joerg Schmalenstroeer, Reinhold Haeb-Umbach, 
Statistical and Neural Network Based Speech Activity Detection in Non-Stationary Acoustic Environments.
Interspeech2020
Keisuke Kinoshita, Thilo von Neumann, Marc Delcroix, Tomohiro Nakatani, Reinhold Haeb-Umbach, 
Multi-Path RNN for Hierarchical Modeling of Long Sequential Data and its Application to Speaker Stream Separation.
Interspeech2020
Thilo von Neumann, Christoph Böddeker, Lukas Drude, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani, Reinhold Haeb-Umbach, 
Multi-Talker ASR for an Unknown Number of Sources: Joint Training of Source Counting, Separation and ASR.
ICASSP2019
Jahn Heymann, Lukas Drude, Reinhold Haeb-Umbach, Keisuke Kinoshita, Tomohiro Nakatani, 
Joint Optimization of Neural Network-based WPE Dereverberation and Acoustic Model for Robust Online ASR.
ICASSP2019
Thilo von Neumann, Keisuke Kinoshita, Marc Delcroix, Shoko Araki, Tomohiro Nakatani, Reinhold Haeb-Umbach, 
All-neural Online Source Separation, Counting, and Diarization for Meeting Analysis.
Interspeech2019
Lukas Drude, Jahn Heymann, Reinhold Haeb-Umbach, 
Unsupervised Training of Neural Mask-Based Beamforming.
Interspeech2019
Naoyuki Kanda, Christoph Böddeker, Jens Heitkaemper, Yusuke Fujita, Shota Horiguchi, Kenji Nagamatsu, Reinhold Haeb-Umbach, 
Guided Source Separation Meets a Strong ASR Backend: Hitachi/Paderborn University Joint Investigation for Dinner Party ASR.
Interspeech2019
Juan M. Martín-Doñas, Jens Heitkaemper, Reinhold Haeb-Umbach, Angel M. Gomez, Antonio M. Peinado, 
Multi-Channel Block-Online Source Extraction Based on Utterance Adaptation.
Interspeech2019
Alexandru Nelus, Janek Ebbers, Reinhold Haeb-Umbach, Rainer Martin 0001, 
Privacy-Preserving Variational Information Feature Extraction for Domestic Activity Monitoring versus Speaker Identification.
SpeechComm2018
Vladimir Despotovic, Oliver Walter, Reinhold Haeb-Umbach, 
Machine learning techniques for semantic analysis of dysarthric speech: An experimental study.
Interspeech2018
Lukas Drude, Christoph Böddeker, Jahn Heymann, Reinhold Haeb-Umbach, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani, 
Integrating Neural Network Based Beamforming and Weighted Prediction Error Dereverberation.
Interspeech2018
Thomas Glarner, Patrick Hanebrink, Janek Ebbers, Reinhold Haeb-Umbach, 
Full Bayesian Hidden Markov Model Variational Autoencoder for Acoustic Unit Discovery.
ICASSP2022
Anastasios Alexandridis, Grant P. Strimel, Ariya Rastrow, Pavel Kveton, Jon Webb, Maurizio Omologo, Siegfried Kunzmann, Athanasios Mouchtaris, 
Caching Networks: Capitalizing on Common Speech for ASR.
ICASSP2022
Liyan Xu, Yile Gu, Jari Kolehmainen, Haidar Khan, Ankur Gandhe, Ariya Rastrow, Andreas Stolcke, Ivan Bulyko, 
RescoreBERT: Discriminative Speech Recognition Rescoring With Bert.
Interspeech2022
Phani Sankar Nidadavolu, Na Xu, Nick Jutila, Ravi Teja Gadde, Aswarth Abhilash Dara, Joseph Savold, Sapan Patel, Aaron Hoff, Veerdhawal Pande, Kevin Crews, Ankur Gandhe, Ariya Rastrow, Roland Maas, 
RefTextLAS: Reference Text Biased Listen, Attend, and Spell Model For Accurate Reading Evaluation.
Interspeech2022
Anirudh Raju, Milind Rao, Gautam Tiwari, Pranav Dheram, Bryan Anderson, Zhe Zhang, Chul Lee, Bach Bui, Ariya Rastrow, 
On joint training with interfaces for spoken language understanding.
Interspeech2022
Yi Xie, Jonathan Macoskey, Martin Radfar, Feng-Ju Chang, Brian King, Ariya Rastrow, Athanasios Mouchtaris, Grant P. Strimel, 
Compute Cost Amortized Transformer for Streaming ASR.
Interspeech2022
Kai Zhen, Hieu Duy Nguyen, Raviteja Chinta, Nathan Susanj, Athanasios Mouchtaris, Tariq Afzal, Ariya Rastrow, 
Sub-8-Bit Quantization Aware Training for 8-Bit Neural Network Accelerator with On-Device Speech Recognition.
KDD2022
Gopinath Chennupati, Milind Rao, Gurpreet Chadha, Aaron Eakin, Anirudh Raju, Gautam Tiwari, Anit Kumar Sahu, Ariya Rastrow, Jasha Droppo, Andy Oberlin, Buddha Nandanoor, Prahalad Venkataramanan, Zheng Wu, Pankaj Sitpure, 
ILASR: Privacy-Preserving Incremental Learning for Automatic Speech Recognition at Production Scale.
ICASSP2021
Hu Hu, Xuesong Yang, Zeynab Raeesy, Jinxi Guo, Gokce Keskin, Harish Arsikere, Ariya Rastrow, Andreas Stolcke, Roland Maas, 
REDAT: Accent-Invariant Representation for End-To-End ASR by Domain Adversarial Training with Relabeling.
ICASSP2021
Linda Liu, Yile Gu, Aditya Gourav, Ankur Gandhe, Shashank Kalmane, Denis Filimonov, Ariya Rastrow, Ivan Bulyko, 
Domain-Aware Neural Language Models for Speech Recognition.
ICASSP2021
Jon Macoskey, Grant P. Strimel, Ariya Rastrow, 
Bifocal Neural ASR: Exploiting Keyword Spotting for Inference Optimization.
ICASSP2021
Surabhi Punjabi, Harish Arsikere, Zeynab Raeesy, Chander Chandak, Nikhil Bhave, Ankish Bansal, Markus Müller, Sergio Murillo, Ariya Rastrow, Andreas Stolcke, Jasha Droppo, Sri Garimella, Roland Maas, Mat Hans, Athanasios Mouchtaris, Siegfried Kunzmann, 
Joint ASR and Language Identification Using RNN-T: An Efficient Approach to Dynamic Language Switching.
ICASSP2021
Milind Rao, Pranav Dheram, Gautam Tiwari, Anirudh Raju, Jasha Droppo, Ariya Rastrow, Andreas Stolcke, 
DO as I Mean, Not as I Say: Sequence Loss Training for Spoken Language Understanding.
Interspeech2021
Jonathan Macoskey, Grant P. Strimel, Ariya Rastrow, 
Learning a Neural Diff for Speech Models.
Interspeech2021
Jonathan Macoskey, Grant P. Strimel, Jinru Su, Ariya Rastrow, 
Amortized Neural Networks for Low-Latency Speech Recognition.
Interspeech2021
Martin Radfar, Athanasios Mouchtaris, Siegfried Kunzmann, Ariya Rastrow, 
FANS: Fusing ASR and NLU for On-Device SLU.
Interspeech2021
Swayambhu Nath Ray, Minhua Wu, Anirudh Raju, Pegah Ghahremani, Raghavendra Bilgi, Milind Rao, Harish Arsikere, Ariya Rastrow, Andreas Stolcke, Jasha Droppo, 
Listen with Intent: Improving Speech Recognition with Audio-to-Intent Front-End.
Interspeech2021
Samik Sadhu, Di He, Che-Wei Huang, Sri Harish Mallidi, Minhua Wu, Ariya Rastrow, Andreas Stolcke, Jasha Droppo, Roland Maas, 
wav2vec-C: A Self-Supervised Model for Speech Representation Learning.
ICASSP2020
Ankur Gandhe, Ariya Rastrow, 
Audio-Attention Discriminative Language Model for ASR Rescoring.
Interspeech2020
Milind Rao, Anirudh Raju, Pranav Dheram, Bach Bui, Ariya Rastrow, 
Speech to Semantics: Improve ASR and NLU Jointly via All-Neural Interfaces.
Interspeech2020
Grant P. Strimel, Ariya Rastrow, Gautam Tiwari, Adrien Piérard, Jon Webb, 
Rescore in a Flash: Compact, Cache Efficient Hashing Data Structures for n-Gram Language Models.
Interspeech2022
Marcely Zanon Boito, Laurent Besacier, Natalia A. Tomashenko, Yannick Estève, 
A Study of Gender Impact in Self-supervised Models for Speech-to-Text Systems.
Interspeech2022
Valentin Pelloin, Franck Dary, Nicolas Hervé, Benoît Favre, Nathalie Camelin, Antoine Laurent, Laurent Besacier, 
ASR-Generated Text for Language Model Pre-training Applied to Speech Tasks.
Interspeech2022
Brooke Stephenson, Laurent Besacier, Laurent Girin, Thomas Hueber, 
BERT, can HE predict contrastive focus? Predicting and controlling prominence in neural TTS using a language model.
Interspeech2021
Solène Evain, Ha Nguyen, Hang Le, Marcely Zanon Boito, Salima Mdhaffar, Sina Alisamir, Ziyi Tong, Natalia A. Tomashenko, Marco Dinarelli, Titouan Parcollet, Alexandre Allauzen, Yannick Estève, Benjamin Lecouteux, François Portet, Solange Rossato, Fabien Ringeval, Didier Schwab, Laurent Besacier, 
LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech.
Interspeech2021
Ha Nguyen, Yannick Estève, Laurent Besacier, 
Impact of Encoding and Segmentation Strategies on End-to-End Simultaneous Speech Translation.
Interspeech2021
Brooke Stephenson, Thomas Hueber, Laurent Girin, Laurent Besacier, 
Alternate Endings: Improving Prosody for Incremental Neural TTS with Predicted Future Text Input.
TASLP2020
Odette Scharenborg, Lucas Ondel, Shruti Palaskar, Philip Arthur, Francesco Ciannella, Mingxing Du, Elin Larsen, Danny Merkx, Rachid Riad, Liming Wang, Emmanuel Dupoux, Laurent Besacier, Alan W. Black, Mark Hasegawa-Johnson, Florian Metze, Graham Neubig, Sebastian Stüker, Pierre Godard, Markus Müller 0001, 
Speech Technology for Unwritten Languages.
ICASSP2020
Marco Dinarelli, Nikita Kapoor, Bassam Jabaian, Laurent Besacier, 
A Data Efficient End-to-End Spoken Language Understanding Architecture.
Interspeech2020
Ewan Dunbar, Julien Karadayi, Mathieu Bernard, Xuan-Nga Cao, Robin Algayres, Lucas Ondel, Laurent Besacier, Sakriani Sakti, Emmanuel Dupoux, 
The Zero Resource Speech Challenge 2020: Discovering Discrete Subword and Word Units.
Interspeech2020
Maha Elbayad, Laurent Besacier, Jakob Verbeek, 
Efficient Wait-k Models for Simultaneous Machine Translation.
Interspeech2020
Ha Nguyen, Fethi Bougares, Natalia A. Tomashenko, Yannick Estève, Laurent Besacier, 
Investigating Self-Supervised Pre-Training for End-to-End Speech Translation.
Interspeech2020
Vaishali Pal, Fabien Guillot, Manish Shrivastava 0001, Jean-Michel Renders, Laurent Besacier, 
Modeling ASR Ambiguity for Neural Dialogue State Tracking.
Interspeech2020
Brooke Stephenson, Laurent Besacier, Laurent Girin, Thomas Hueber, 
What the Future Brings: Investigating the Impact of Lookahead for Incremental Neural TTS.
ICASSP2019
William N. Havard, Jean-Pierre Chevrot, Laurent Besacier, 
Models of Visually Grounded Speech Signal Pay Attention to Nouns: A Bilingual Experiment on English and Japanese.
Interspeech2019
Marcely Zanon Boito, Aline Villavicencio, Laurent Besacier, 
Empirical Evaluation of Sequence-to-Sequence Models for Word Discovery in Low-Resource Settings.
Interspeech2019
Ewan Dunbar, Robin Algayres, Julien Karadayi, Mathieu Bernard, Juan Benjumea, Xuan-Nga Cao, Lucie Miskic, Charlotte Dugrain, Lucas Ondel, Alan W. Black, Laurent Besacier, Sakriani Sakti, Emmanuel Dupoux, 
The Zero Resource Speech Challenge 2019: TTS Without T.
ICASSP2018
Zied Elloumi, Laurent Besacier, Olivier Galibert, Juliette Kahn, Benjamin Lecouteux, 
ASR Performance Prediction on Unseen Broadcast Programs Using Convolutional Neural Networks.
ICASSP2018
Lucas Ondel, Pierre Godard, Laurent Besacier, Elin Larsen, Mark Hasegawa-Johnson, Odette Scharenborg, Emmanuel Dupoux, Lukás Burget, François Yvon, Sanjeev Khudanpur, 
Bayesian Models for Unit Discovery on a Very Low Resource Language.
ICASSP2018
Odette Scharenborg, Laurent Besacier, Alan W. Black, Mark Hasegawa-Johnson, Florian Metze, Graham Neubig, Sebastian Stüker, Pierre Godard, Markus Müller 0001, Lucas Ondel, Shruti Palaskar, Philip Arthur, Francesco Ciannella, Mingxing Du, Elin Larsen, Danny Merkx, Rachid Riad, Liming Wang, Emmanuel Dupoux, 
Linguistic Unit Discovery from Multi-Modal Inputs in Unwritten Languages: Summary of the "Speaking Rosetta" JSALT 2017 Workshop.
Interspeech2018
Pierre Godard, Marcely Zanon Boito, Lucas Ondel, Alexandre Berard, François Yvon, Aline Villavicencio, Laurent Besacier, 
Unsupervised Word Segmentation from Speech with Attention.
ICASSP2022
Dongseong Hwang, Ananya Misra, Zhouyuan Huo, Nikhil Siddhartha, Shefali Garg, David Qiu, Khe Chai Sim, Trevor Strohman, Françoise Beaufays, Yanzhang He, 
Large-Scale ASR Domain Adaptation Using Self- and Semi-Supervised Learning.
Interspeech2022
Theresa Breiner, Swaroop Ramaswamy, Ehsan Variani, Shefali Garg, Rajiv Mathews, Khe Chai Sim, Kilol Gupta, Mingqing Chen, Lara McConnaughey, 
UserLibri: A Dataset for ASR Personalization Using Only Text.
Interspeech2022
Zhouyuan Huo, Dongseong Hwang, Khe Chai Sim, Shefali Garg, Ananya Misra, Nikhil Siddhartha, Trevor Strohman, Françoise Beaufays, 
Incremental Layer-Wise Self-Supervised Learning for Efficient Unsupervised Speech Domain Adaptation On Device.
Interspeech2022
Dongseong Hwang, Khe Chai Sim, Zhouyuan Huo, Trevor Strohman, 
Pseudo Label Is Better Than Human Label.
Interspeech2022
Golan Pundak, Tsendsuren Munkhdalai, Khe Chai Sim, 
On-the-fly ASR Corrections with Audio Exemplars.
Interspeech2021
Ananya Misra, Dongseong Hwang, Zhouyuan Huo, Shefali Garg, Nikhil Siddhartha, Arun Narayanan, Khe Chai Sim, 
A Comparison of Supervised and Unsupervised Pre-Training of End-to-End Models.
Interspeech2021
Khe Chai Sim, Angad Chandorkar, Fan Gao, Mason Chua, Tsendsuren Munkhdalai, Françoise Beaufays, 
Robust Continuous On-Device Personalization for Automatic Speech Recognition.
ICASSP2020
Mary Gooneratne, Khe Chai Sim, Petr Zadrazil, Andreas Kabel, Françoise Beaufays, Giovanni Motta, 
Low-Rank Gradient Approximation for Memory-Efficient on-Device Training of Deep Neural Network.
ICASSP2019
Yanzhang He, Tara N. Sainath, Rohit Prabhavalkar, Ian McGraw, Raziel Alvarez, Ding Zhao, David Rybach, Anjuli Kannan, Yonghui Wu, Ruoming Pang, Qiao Liang, Deepti Bhatia, Yuan Shangguan, Bo Li 0028, Golan Pundak, Khe Chai Sim, Tom Bagby, Shuo-Yiin Chang, Kanishka Rao, Alexander Gruenstein, 
Streaming End-to-end Speech Recognition for Mobile Devices.
ICASSP2019
Jahn Heymann, Khe Chai Sim, Bo Li 0028, 
Improving CTC Using Stimulated Learning for Sequence Modeling.
Interspeech2019
Khe Chai Sim, Petr Zadrazil, Françoise Beaufays, 
An Investigation into On-Device Personalization of End-to-End Automatic Speech Recognition Models.
TASLP2018
Chunyang Wu, Mark J. F. Gales, Anton Ragni, Penny Karanasou, Khe Chai Sim, 
Improving Interpretability and Regularization in Deep Learning.
ICASSP2018
Skanda Koppula, Khe Chai Sim, Kean K. Chin, 
Understanding Recurrent Neural State Using Memory Signatures.
ICASSP2018
Bo Li 0028, Tara N. Sainath, Khe Chai Sim, Michiel Bacchiani, Eugene Weinstein, Patrick Nguyen, Zhifeng Chen, Yanghui Wu, Kanishka Rao, 
Multi-Dialect Speech Recognition with a Single Sequence-to-Sequence Model.
ICASSP2018
Lahiru Samarakoon, Brian Mak, Khe Chai Sim, 
learning Effective Factorized Hidden Layer Bases Using Student-Teacher Training for LSTM Acoustic Model Adaptation.
Interspeech2018
Khe Chai Sim, Arun Narayanan, Ananya Misra, Anshuman Tripathi, Golan Pundak, Tara N. Sainath, Parisa Haghani, Bo Li 0028, Michiel Bacchiani, 
Domain Adaptation Using Factorized Hidden Layer for Robust Automatic Speech Recognition.
Interspeech2017
Bo Li 0028, Tara N. Sainath, Arun Narayanan, Joe Caroselli, Michiel Bacchiani, Ananya Misra, Izhak Shafran, Hasim Sak, Golan Pundak, Kean K. Chin, Khe Chai Sim, Ron J. Weiss, Kevin W. Wilson, Ehsan Variani, Chanwoo Kim 0001, Olivier Siohan, Mitchel Weintraub, Erik McDermott, Richard Rose, Matt Shannon, 
Acoustic Modeling for Google Home.
Interspeech2017
Lahiru Samarakoon, Brian Mak, Khe Chai Sim, 
Learning Factorized Transforms for Unsupervised Adaptation of LSTM-RNN Acoustic Models.
Interspeech2017
Khe Chai Sim, Arun Narayanan, 
An Efficient Phone N-Gram Forward-Backward Computation Using Dense Matrix Multiplication.
ICASSP2016
Lahiru Samarakoon, Khe Chai Sim, 
On combining i-vectors and discriminative adaptation methods for unsupervised speaker normalization in DNN acoustic models.
ICASSP2022
Ke Hu, Tara N. Sainath, Arun Narayanan, Ruoming Pang, Trevor Strohman, 
Transducer-Based Streaming Deliberation for Cascaded Encoders.
ICASSP2022
Tara N. Sainath, Yanzhang He, Arun Narayanan, Rami Botros, Weiran Wang, David Qiu, Chung-Cheng Chiu, Rohit Prabhavalkar, Alexander Gruenstein, Anmol Gulati, Bo Li 0028, David Rybach, Emmanuel Guzman, Ian McGraw, James Qin, Krzysztof Choromanski, Qiao Liang, Robert David, Ruoming Pang, Shuo-Yiin Chang, Trevor Strohman, W. Ronny Huang, Wei Han 0002, Yonghui Wu, Yu Zhang 0033, 
Improving The Latency And Quality Of Cascaded Encoders.
Interspeech2022
Ehsan Amid, Om Dipakbhai Thakkar, Arun Narayanan, Rajiv Mathews, Françoise Beaufays, 
Extracting Targeted Training Data from ASR Models, and How to Mitigate It.
Interspeech2022
Shaojin Ding, Rajeev Rikhye, Qiao Liang, Yanzhang He, Quan Wang, Arun Narayanan, Tom O'Malley, Ian McGraw, 
Personal VAD 2.0: Optimizing Personal Voice Activity Detection for On-Device Speech Recognition.
Interspeech2022
Yuma Koizumi, Shigeki Karita, Arun Narayanan, Sankaran Panchapagesan, Michiel Bacchiani, 
SNRi Target Training for Joint Speech Enhancement and Recognition.
Interspeech2022
Thomas R. O'Malley, Arun Narayanan, Quan Wang, 
A universally-deployable ASR frontend for joint acoustic echo cancellation, speech enhancement, and voice separation.
Interspeech2022
Sankaran Panchapagesan, Arun Narayanan, Turaj Zakizadeh Shabestary, Shuai Shao, Nathan Howard, Alex Park 0001, James Walker, Alexander Gruenstein, 
A Conformer-based Waveform-domain Neural Acoustic Echo Canceller Optimized for ASR Accuracy.
ICASSP2021
Thibault Doutre, Wei Han 0002, Min Ma, Zhiyun Lu, Chung-Cheng Chiu, Ruoming Pang, Arun Narayanan, Ananya Misra, Yu Zhang 0033, Liangliang Cao, 
Improving Streaming Automatic Speech Recognition with Non-Streaming Model Distillation on Unsupervised Data.
ICASSP2021
Bo Li 0028, Anmol Gulati, Jiahui Yu, Tara N. Sainath, Chung-Cheng Chiu, Arun Narayanan, Shuo-Yiin Chang, Ruoming Pang, Yanzhang He, James Qin, Wei Han 0002, Qiao Liang, Yu Zhang 0033, Trevor Strohman, Yonghui Wu, 
A Better and Faster end-to-end Model for Streaming ASR.
ICASSP2021
Jiahui Yu, Chung-Cheng Chiu, Bo Li 0028, Shuo-Yiin Chang, Tara N. Sainath, Yanzhang He, Arun Narayanan, Wei Han 0002, Anmol Gulati, Yonghui Wu, Ruoming Pang, 
FastEmit: Low-Latency Streaming ASR with Sequence-Level Emission Regularization.
Interspeech2021
Ananya Misra, Dongseong Hwang, Zhouyuan Huo, Shefali Garg, Nikhil Siddhartha, Arun Narayanan, Khe Chai Sim, 
A Comparison of Supervised and Unsupervised Pre-Training of End-to-End Models.
Interspeech2021
Rajeev Rikhye, Quan Wang, Qiao Liang, Yanzhang He, Ding Zhao, Yiteng Huang, Arun Narayanan, Ian McGraw, 
Personalized Keyphrase Detection Using Speaker and Environment Information.
Interspeech2021
Tara N. Sainath, Yanzhang He, Arun Narayanan, Rami Botros, Ruoming Pang, David Rybach, Cyril Allauzen, Ehsan Variani, James Qin, Quoc-Nam Le-The, Shuo-Yiin Chang, Bo Li 0028, Anmol Gulati, Jiahui Yu, Chung-Cheng Chiu, Diamantino Caseiro, Wei Li 0133, Qiao Liang, Pat Rondon, 
An Efficient Streaming Non-Recurrent On-Device End-to-End Model with Improvements to Rare-Word Modeling.
ICASSP2020
Tara N. Sainath, Yanzhang He, Bo Li 0028, Arun Narayanan, Ruoming Pang, Antoine Bruguier, Shuo-Yiin Chang, Wei Li 0133, Raziel Alvarez, Zhifeng Chen, Chung-Cheng Chiu, David Garcia, Alexander Gruenstein, Ke Hu, Anjuli Kannan, Qiao Liang, Ian McGraw, Cal Peyser, Rohit Prabhavalkar, Golan Pundak, David Rybach, Yuan Shangguan, Yash Sheth, Trevor Strohman, Mirkó Visontai, Yonghui Wu, Yu Zhang 0033, Ding Zhao, 
A Streaming On-Device End-To-End Model Surpassing Server-Side Conventional Model Quality and Latency.
Interspeech2020
Antoine Bruguier, Ananya Misra, Arun Narayanan, Rohit Prabhavalkar, 
Anti-Aliasing Regularization in Stacking Layers.
ICASSP2018
Chanwoo Kim 0001, Tara N. Sainath, Arun Narayanan, Ananya Misra, Rajeev C. Nongpiur, Michiel Bacchiani, 
Spectral Distortion Model for Training Phase-Sensitive Deep-Neural Networks for Far-Field Speech Recognition.
Interspeech2018
Chanwoo Kim 0001, Ehsan Variani, Arun Narayanan, Michiel Bacchiani, 
Efficient Implementation of the Room Simulator for Training Deep Neural Network Acoustic Models.
Interspeech2018
Khe Chai Sim, Arun Narayanan, Ananya Misra, Anshuman Tripathi, Golan Pundak, Tara N. Sainath, Parisa Haghani, Bo Li 0028, Michiel Bacchiani, 
Domain Adaptation Using Factorized Hidden Layer for Robust Automatic Speech Recognition.
TASLP2017
Tara N. Sainath, Ron J. Weiss, Kevin W. Wilson, Bo Li 0028, Arun Narayanan, Ehsan Variani, Michiel Bacchiani, Izhak Shafran, Andrew W. Senior, Kean K. Chin, Ananya Misra, Chanwoo Kim 0001, 
Multichannel Signal Processing With Deep Neural Networks for Automatic Speech Recognition.
Interspeech2017
Joe Caroselli, Izhak Shafran, Arun Narayanan, Richard Rose, 
Adaptive Multichannel Dereverberation for Automatic Speech Recognition.
Interspeech2022
Visar Berisha, Chelsea Krantsevich, Gabriela Stegmann, Shira Hahn, Julie Liss, 
Are reported accuracies in the clinical speech machine learning literature overoptimistic?
Interspeech2022
Kelvin Tran, Lingfeng Xu, Gabriela Stegmann, Julie Liss, Visar Berisha, Rene Utianski, 
Investigating the Impact of Speech Compression on the Acoustics of Dysarthric Speech.
ICASSP2021
Vikram C. Mathad, Nancy Scherer, Kathy Chapman, Julie Liss, Visar Berisha, 
An Attention Model for Hypernasality Prediction in Children with Cleft Palate.
Interspeech2021
Vikram C. Mathad, Tristan J. Mahr, Nancy Scherer, Kathy Chapman, Katherine C. Hustad, Julie Liss, Visar Berisha, 
The Impact of Forced-Alignment Errors on Automatic Pronunciation Evaluation.
Interspeech2021
Jianwei Zhang, Suren Jayasuriya, Visar Berisha, 
Restoring Degraded Speech via a Modified Diffusion Model.
ICASSP2020
Vikram C. Mathad, Kathy Chapman, Julie Liss, Nancy Scherer, Visar Berisha, 
Deep Learning Based Prediction of Hypernasality for Clinical Applications.
Interspeech2020
Deepak Kadetotad, Jian Meng, Visar Berisha, Chaitali Chakrabarti, Jae-sun Seo, 
Compressing LSTM Networks with Hierarchical Coarse-Grain Sparsity.
Interspeech2020
Meredith Moore, Piyush Papreja, Michael Saxon, Visar Berisha, Sethuraman Panchanathan, 
UncommonVoice: A Crowdsourced Dataset of Dysphonic Speech.
ICASSP2019
Jacob Peplinski, Visar Berisha, Julie Liss, Shira Hahn, Jeremy Shefner, Seward B. Rutkove, Kristin Qi, Kerisa Shelton, 
Objective Assessment of Vocal Tremor.
ICASSP2019
Michael Saxon, Julie Liss, Visar Berisha, 
Objective Measures of Plosive Nasalization in Hypernasal Speech.
Interspeech2019
Nichola Lubold, Stephanie A. Borrie, Tyson S. Barrett, Megan M. Willi, Visar Berisha, 
Do Conversational Partners Entrain on Articulatory Precision?
Interspeech2019
Meredith Moore, Michael Saxon, Hemanth Venkateswara, Visar Berisha, Sethuraman Panchanathan, 
Say What? A Dataset for Exploring the Error Patterns That Two ASR Engines Make.
Interspeech2019
Rohit Voleti, Stephanie Woolridge, Julie M. Liss, Melissa Milanovic, Christopher R. Bowie, Visar Berisha, 
Objective Assessment of Social Skills Using Automated Language Analysis for Identification of Schizophrenia and Bipolar Disorder.
Interspeech2019
Yan Xiong, Visar Berisha, Chaitali Chakrabarti, 
Residual + Capsule Networks (ResCap) for Simultaneous Single-Channel Overlapped Keyword Recognition.
ICASSP2018
Yishan Jiao, Ming Tu, Visar Berisha, Julie Liss, 
Simulating Dysarthric Speech for Training Data Augmentation in Clinical Speech Applications.
Interspeech2018
Huan Song, Megan M. Willi, Jayaraman J. Thiagarajan, Visar Berisha, Andreas Spanias, 
Triplet Network with Attention for Speaker Diarization.
Interspeech2018
Ming Tu, Anna Grabek, Julie Liss, Visar Berisha, 
Investigating the Role of L1 in Automatic Pronunciation Evaluation of L2 Speech.
Interspeech2018
Megan M. Willi, Stephanie A. Borrie, Tyson S. Barrett, Ming Tu, Visar Berisha, 
A Discriminative Acoustic-Prosodic Approach for Measuring Local Entrainment.
ICASSP2017
Yishan Jiao, Visar Berisha, Julie Liss, 
Interpretable phonological features for clinical applications.
ICASSP2017
Ming Tu, Visar Berisha, Julie Liss, 
Objective assessment of pathological speech using distribution regression.
TASLP2022
Heming Wang, Xueliang Zhang 0001, DeLiang Wang, 
Fusing Bone-Conduction and Air-Conduction Sensors for Complex-Domain Speech Enhancement.
ICASSP2022
Jinjiang Liu, Xueliang Zhang 0001, 
DRC-NET: Densely Connected Recurrent Convolutional Neural Network for Speech Dereverberation.
ICASSP2022
Heming Wang, Xueliang Zhang 0001, DeLiang Wang, 
Attention-Based Fusion for Bone-Conducted and Air-Conducted Speech Enhancement in the Complex Domain.
ICASSP2022
Yang Yang, Hui Zhang 0031, Xueliang Zhang 0001, Huaiwen Zhang, 
Alleviating the Loss-Metric Mismatch in Supervised Single-Channel Speech Enhancement.
Interspeech2022
Jiahui Pan, Shuai Nie, Hui Zhang 0031, Shulin He, Kanghao Zhang, Shan Liang, Xueliang Zhang 0001, Jianhua Tao, 
Speaker recognition-assisted robust audio deepfake detection.
Interspeech2022
Chenggang Zhang, Jinjiang Liu, Xueliang Zhang 0001, 
LCSM: A Lightweight Complex Spectral Mapping Framework for Stereophonic Acoustic Echo Cancellation.
TASLP2021
Hao Li 0046, DeLiang Wang, Xueliang Zhang 0001, Guanglai Gao, 
Recurrent Neural Networks and Acoustic Features for Frame-Level Signal-to-Noise Ratio Estimation.
ICASSP2021
Ke Tan 0001, Xueliang Zhang 0001, DeLiang Wang, 
Real-Time Speech Enhancement for Mobile Communication Based on Dual-Channel Complex Spectral Mapping.
Interspeech2021
Jinjiang Liu, Xueliang Zhang 0001, 
Inplace Gated Convolutional Recurrent Neural Network for Dual-Channel Speech Enhancement.
Interspeech2021
Kanghao Zhang, Shulin He, Hao Li 0046, Xueliang Zhang 0001, 
DBNet: A Dual-Branch Network Architecture Processing on Spectrum and Waveform for Single-Channel Speech Enhancement.
TASLP2020
Zhihao Du, Xueliang Zhang 0001, Jiqing Han, 
A Joint Framework of Denoising Autoencoder and Generative Vocoder for Monaural Speech Enhancement.
Interspeech2020
Zhihao Du, Jiqing Han, Xueliang Zhang 0001, 
Double Adversarial Network Based Monaural Speech Enhancement for Robust Speech Recognition.
Interspeech2020
Hao Li 0046, DeLiang Wang, Xueliang Zhang 0001, Guanglai Gao, 
Frame-Level Signal-to-Noise Ratio Estimation Using Deep Learning.
Interspeech2020
Tianjiao Xu, Hui Zhang 0031, Xueliang Zhang 0001, 
Polishing the Classical Likelihood Ratio Test by Supervised Learning for Voice Activity Detection.
Interspeech2020
Chenggang Zhang, Xueliang Zhang 0001, 
A Robust and Cascaded Acoustic Echo Cancellation Based on Deep Learning.
TASLP2019
Zhong-Qiu Wang, Xueliang Zhang 0001, DeLiang Wang, 
Robust Speaker Localization Guided by Deep Learning-Based Time-Frequency Masking.
Interspeech2019
Yun Liu, Hui Zhang 0031, Xueliang Zhang 0001, Yuhang Cao, 
Investigation of Cost Function for Supervised Monaural Speech Separation.
TASLP2018
Shuai Nie, Shan Liang, Wenju Liu, Xueliang Zhang 0001, Jianhua Tao, 
Deep Learning Based Speech Separation via NMF-Style Reconstructions.
Interspeech2018
Yun Liu, Hui Zhang 0031, Xueliang Zhang 0001, 
Using Shifted Real Spectrum Mask as Training Target for Supervised Speech Separation.
Interspeech2018
Zhong-Qiu Wang, Xueliang Zhang 0001, DeLiang Wang, 
Robust TDOA Estimation Based on Time-Frequency Masking and Deep Neural Networks.
ICASSP2022
Salima Mdhaffar, Jean-François Bonastre, Marc Tommasi, Natalia A. Tomashenko, Yannick Estève, 
Retrieving Speaker Information from Personalized Acoustic Models for Speech Recognition.
ICASSP2022
Natalia A. Tomashenko, Salima Mdhaffar, Marc Tommasi, Yannick Estève, Jean-François Bonastre, 
Privacy Attacks for Automatic Speech Recognition Acoustic Models in A Federated Learning Framework.
Interspeech2022
Pierre-Michel Bousquet, Mickael Rouvier, Jean-François Bonastre, 
Reliability criterion based on learning-phase entropy for speaker recognition with neural network.
Interspeech2022
Mohammad MohammadAmini, Driss Matrouf, Jean-François Bonastre, Sandipana Dowerah, Romain Serizel, Denis Jouvet, 
Barlow Twins self-supervised learning for robust speaker recognition.
Interspeech2021
Anaïs Chanclu, Imen Ben Amor, Cédric Gendrot, Emmanuel Ferragne, Jean-François Bonastre, 
Automatic Classification of Phonation Types in Spontaneous Speech: Towards a New Workflow for the Characterization of Speakers' Voice Quality.
Interspeech2021
Paul-Gauthier Noé, Mohammad MohammadAmini, Driss Matrouf, Titouan Parcollet, Andreas Nautsch, Jean-François Bonastre, 
Adversarial Disentanglement of Speaker Representation for Attribute-Driven Privacy Preservation.
Interspeech2021
Benjamin O'Brien, Natalia A. Tomashenko, Anaïs Chanclu, Jean-François Bonastre, 
Anonymous Speaker Clusters: Making Distinctions Between Anonymised Speech Recordings with Clustering Interface.
Interspeech2020
Adrien Gresse, Mathias Quillot, Richard Dufour, Jean-François Bonastre, 
Learning Voice Representation Using Knowledge Distillation for Automatic Voice Casting.
Interspeech2020
Ana Montalvo, José R. Calvo, Jean-François Bonastre, 
Multi-Task Learning for Voice Related Recognition Tasks.
Interspeech2020
Andreas Nautsch, Jose Patino 0001, Natalia A. Tomashenko, Junichi Yamagishi, Paul-Gauthier Noé, Jean-François Bonastre, Massimiliano Todisco, Nicholas W. D. Evans, 
The Privacy ZEBRA: Zero Evidence Biometric Recognition Assessment.
Interspeech2020
Paul-Gauthier Noé, Jean-François Bonastre, Driss Matrouf, Natalia A. Tomashenko, Andreas Nautsch, Nicholas W. D. Evans, 
Speech Pseudonymisation Assessment Using Voice Similarity Matrices.
Interspeech2020
Natalia A. Tomashenko, Brij Mohan Lal Srivastava, Xin Wang 0037, Emmanuel Vincent 0001, Andreas Nautsch, Junichi Yamagishi, Nicholas W. D. Evans, Jose Patino 0001, Jean-François Bonastre, Paul-Gauthier Noé, Massimiliano Todisco, 
Introducing the VoicePrivacy Initiative.
ICASSP2019
Adrien Gresse, Mathias Quillot, Richard Dufour, Vincent Labatut, Jean-François Bonastre, 
Similarity Metric Based on Siamese Neural Networks for Voice Casting.
Interspeech2019
Itshak Lapidot, Jean-François Bonastre, 
Effects of Waveform PMF on Anti-Spoofing Detection.
TASLP2018
Waad Ben Kheder, Driss Matrouf, Moez Ajili, Jean-François Bonastre, 
A Unified Joint Model to Deal With Nuisance Variabilities in the i-Vector Space.
Interspeech2018
Moez Ajili, Jean-François Bonastre, Solange Rossato, 
Voice Comparison and Rhythm: Behavioral Differences between Target and Non-target Comparisons.
Interspeech2018
Itshak Lapidot, Héctor Delgado, Massimiliano Todisco, Nicholas W. D. Evans, Jean-François Bonastre, 
Speech Database and Protocol Validation Using Waveform Entropy.
Interspeech2017
Moez Ajili, Jean-François Bonastre, Waad Ben Kheder, Solange Rossato, Juliette Kahn, 
Homogeneity Measure Impact on Target and Non-Target Trials in Forensic Voice Comparison.
Interspeech2017
Adrien Gresse, Mickael Rouvier, Richard Dufour, Vincent Labatut, Jean-François Bonastre, 
Acoustic Pairing of Original and Dubbed Voices in the Context of Video Game Localization.
Interspeech2017
Kong-Aik Lee, Ville Hautamäki, Tomi Kinnunen, Anthony Larcher, Chunlei Zhang, Andreas Nautsch, Themos Stafylakis, Gang Liu 0001, Mickaël Rouvier, Wei Rao, Federico Alegre, J. Ma, Man-Wai Mak, Achintya Kumar Sarkar, Héctor Delgado, Rahim Saeidi, Hagai Aronowitz, Aleksandr Sizov, Hanwu Sun, Trung Hieu Nguyen 0001, G. Wang, Bin Ma 0001, Ville Vestman, Md. Sahidullah, M. Halonen, Anssi Kanervisto, Gaël Le Lan, Fahimeh Bahmaninezhad, Sergey Isadskiy, Christian Rathgeb, Christoph Busch 0001, Georgios Tzimiropoulos, Q. Qian, Z. Wang, Q. Zhao, T. Wang, H. Li, J. Xue, S. Zhu, R. Jin, T. Zhao, Pierre-Michel Bousquet, Moez Ajili, Waad Ben Kheder, Driss Matrouf, Zhi Hao Lim, Chenglin Xu, Haihua Xu, Xiong Xiao, Eng Siong Chng, Benoit G. B. Fauve, Kaavya Sriskandaraja, Vidhyasaharan Sethu, W. W. Lin, Dennis Alexander Lehmann Thomsen, Zheng-Hua Tan, Massimiliano Todisco, Nicholas W. D. Evans, Haizhou Li 0001, John H. L. Hansen, Jean-François Bonastre, Eliathamby Ambikairajah, 
The I4U Mega Fusion and Collaboration for NIST Speaker Recognition Evaluation 2016.
TASLP2022
Xiaobo Liang, Lijun Wu, Juntao Li, Tao Qin, Min Zhang 0005, Tie-Yan Liu, 
Multi-Teacher Distillation With Single Model for Neural Machine Translation.
Interspeech2022
Yihan Wu, Xu Tan 0003, Bohan Li 0003, Lei He 0005, Sheng Zhao, Ruihua Song, Tao Qin, Tie-Yan Liu, 
AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios.
Interspeech2022
Guangyan Zhang, Kaitao Song, Xu Tan 0003, Daxin Tan, Yuzi Yan, Yanqing Liu, Gang Wang, Wei Zhou, Tao Qin, Tan Lee, Sheng Zhao, 
Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech.
NeurIPS2022
Yichong Leng, Zehua Chen, Junliang Guo, Haohe Liu, Jiawei Chen, Xu Tan, Danilo P. Mandic, Lei He, Xiangyang Li, Tao Qin, Sheng Zhao, Tie-Yan Liu, 
BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis.
ACL2022
Yi Ren 0006, Xu Tan 0003, Tao Qin, Zhou Zhao, Tie-Yan Liu, 
Revisiting Over-Smoothness in Text to Speech.
ICASSP2021
Yichong Leng, Xu Tan 0003, Sheng Zhao, Frank K. Soong, Xiang-Yang Li 0001, Tao Qin, 
MBNET: MOS Prediction for Synthesized Speech with Mean-Bias Network.
ICASSP2021
Renqian Luo, Xu Tan 0003, Rui Wang, Tao Qin, Jinzhu Li, Sheng Zhao, Enhong Chen, Tie-Yan Liu, 
Lightspeech: Lightweight and Fast Text to Speech with Neural Architecture Search.
ICASSP2021
Linghui Meng 0001, Jin Xu 0010, Xu Tan 0003, Jindong Wang 0001, Tao Qin, Bo Xu 0002, 
MixSpeech: Data Augmentation for Low-Resource Automatic Speech Recognition.
ICASSP2021
Yuzi Yan, Xu Tan 0003, Bohan Li 0003, Tao Qin, Sheng Zhao, Yuan Shen 0001, Tie-Yan Liu, 
Adaspeech 2: Adaptive Text to Speech with Untranscribed Data.
ICASSP2021
Chen Zhang 0020, Yi Ren 0006, Xu Tan 0003, Jinglin Liu, Kejun Zhang, Tao Qin, Sheng Zhao, Tie-Yan Liu, 
Denoispeech: Denoising Text to Speech with Frame-Level Noise Modeling.
Interspeech2021
Wenxin Hou, Jindong Wang 0001, Xu Tan 0003, Tao Qin, Takahiro Shinozaki, 
Cross-Domain Speech Recognition with Unsupervised Character-Level Distribution Matching.
Interspeech2021
Yuzi Yan, Xu Tan 0003, Bohan Li 0003, Guangyan Zhang, Tao Qin, Sheng Zhao, Yuan Shen 0001, Wei-Qiang Zhang, Tie-Yan Liu, 
Adaptive Text to Speech for Spontaneous Style.
NeurIPS2021
Jiawei Chen 0008, Xu Tan 0003, Yichong Leng, Jin Xu 0010, Guihua Wen, Tao Qin, Tie-Yan Liu, 
Speech-T: Transducer for Text to Speech and Beyond.
NeurIPS2021
Yichong Leng, Xu Tan 0003, Linchen Zhu, Jin Xu 0010, Renqian Luo, Linquan Liu, Tao Qin, Xiangyang Li 0001, Edward Lin, Tie-Yan Liu, 
FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition.
ICLR2021
Yi Ren 0006, Chenxu Hu, Xu Tan 0003, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu, 
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.
ICLR2021
Mingjian Chen, Xu Tan 0003, Bohan Li 0003, Yanqing Liu, Tao Qin, Sheng Zhao, Tie-Yan Liu, 
AdaSpeech: Adaptive Text to Speech for Custom Voice.
AAAI2021
Chen Zhang 0020, Xu Tan 0003, Yi Ren 0006, Tao Qin, Kejun Zhang, Tie-Yan Liu, 
UWSpeech: Speech to Speech Translation for Unwritten Languages.
TASLP2020
Yang Fan, Fei Tian, Yingce Xia, Tao Qin, Xiang-Yang Li 0001, Tie-Yan Liu, 
Searching Better Architectures for Neural Machine Translation.
Interspeech2020
Mingjian Chen, Xu Tan 0003, Yi Ren 0006, Jin Xu 0010, Hao Sun, Sheng Zhao, Tao Qin, 
MultiSpeech: Multi-Speaker Text to Speech with Transformer.
KDD2020
Yi Ren 0006, Xu Tan 0003, Tao Qin, Jian Luan 0001, Zhou Zhao, Tie-Yan Liu, 
DeepSinger: Singing Voice Synthesis with Data Mined From the Web.
Interspeech2022
Robin Algayres, Adel Nabli, Benoît Sagot, Emmanuel Dupoux, 
Speech Sequence Embeddings using Nearest Neighbors Contrastive Learning.
Interspeech2022
Maureen de Seyssel, Marvin Lavechin, Yossi Adi, Emmanuel Dupoux, Guillaume Wisniewski, 
Probing phoneme, language and speaker information in unsupervised speech representations.
ACL2022
Eugene Kharitonov, Ann Lee 0001, Adam Polyak, Yossi Adi, Jade Copet, Kushal Lakhotia, Tu Anh Nguyen, Morgane Rivière, Abdelrahman Mohamed, Emmanuel Dupoux, Wei-Ning Hsu, 
Text-Free Prosody-Aware Generative Spoken Language Modeling.
Interspeech2021
Ewan Dunbar, Mathieu Bernard, Nicolas Hamilakis, Tu Anh Nguyen, Maureen de Seyssel, Patricia Rozé, Morgane Rivière, Eugene Kharitonov, Emmanuel Dupoux, 
The Zero Resource Speech Challenge 2021: Spoken Language Modelling.
Interspeech2021
Adam Polyak, Yossi Adi, Jade Copet, Eugene Kharitonov, Kushal Lakhotia, Wei-Ning Hsu, Abdelrahman Mohamed, Emmanuel Dupoux, 
Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.
ACL2021
Changhan Wang, Morgane Rivière, Ann Lee 0001, Anne Wu, Chaitanya Talnikar, Daniel Haziza, Mary Williamson, Juan Miguel Pino, Emmanuel Dupoux, 
VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation.
TASLP2020
Odette Scharenborg, Lucas Ondel, Shruti Palaskar, Philip Arthur, Francesco Ciannella, Mingxing Du, Elin Larsen, Danny Merkx, Rachid Riad, Liming Wang, Emmanuel Dupoux, Laurent Besacier, Alan W. Black, Mark Hasegawa-Johnson, Florian Metze, Graham Neubig, Sebastian Stüker, Pierre Godard, Markus Müller 0001, 
Speech Technology for Unwritten Languages.
ICASSP2020
Jacob Kahn, Morgane Rivière, Weiyi Zheng, Evgeny Kharitonov, Qiantong Xu, Pierre-Emmanuel Mazaré, Julien Karadayi, Vitaliy Liptchinsky, Ronan Collobert, Christian Fuegen, Tatiana Likhomanenko, Gabriel Synnaeve, Armand Joulin, Abdelrahman Mohamed, Emmanuel Dupoux, 
Libri-Light: A Benchmark for ASR with Limited or No Supervision.
ICASSP2020
Morgane Rivière, Armand Joulin, Pierre-Emmanuel Mazaré, Emmanuel Dupoux, 
Unsupervised Pretraining Transfers Well Across Languages.
Interspeech2020
Robin Algayres, Mohamed Salah Zaïem, Benoît Sagot, Emmanuel Dupoux, 
Evaluating the Reliability of Acoustic Speech Embeddings.
Interspeech2020
Ewan Dunbar, Julien Karadayi, Mathieu Bernard, Xuan-Nga Cao, Robin Algayres, Lucas Ondel, Laurent Besacier, Sakriani Sakti, Emmanuel Dupoux, 
The Zero Resource Speech Challenge 2020: Discovering Discrete Subword and Word Units.
Interspeech2020
Marvin Lavechin, Ruben Bousbib, Hervé Bredin, Emmanuel Dupoux, Alejandrina Cristià, 
An Open-Source Voice Type Classifier for Child-Centered Daylong Recordings.
Interspeech2020
Rachid Riad, Hadrien Titeux, Laurie Lemoine, Justine Montillot, Jennifer Hamet Bagnou, Xuan-Nga Cao, Emmanuel Dupoux, Anne-Catherine Bachoud-Lévi, 
Vocal Markers from Sustained Phonation in Huntington's Disease.
Interspeech2019
Ewan Dunbar, Robin Algayres, Julien Karadayi, Mathieu Bernard, Juan Benjumea, Xuan-Nga Cao, Lucie Miskic, Charlotte Dugrain, Lucas Ondel, Alan W. Black, Laurent Besacier, Sakriani Sakti, Emmanuel Dupoux, 
The Zero Resource Speech Challenge 2019: TTS Without T.
ICASSP2018
Lucas Ondel, Pierre Godard, Laurent Besacier, Elin Larsen, Mark Hasegawa-Johnson, Odette Scharenborg, Emmanuel Dupoux, Lukás Burget, François Yvon, Sanjeev Khudanpur, 
Bayesian Models for Unit Discovery on a Very Low Resource Language.
ICASSP2018
Odette Scharenborg, Laurent Besacier, Alan W. Black, Mark Hasegawa-Johnson, Florian Metze, Graham Neubig, Sebastian Stüker, Pierre Godard, Markus Müller 0001, Lucas Ondel, Shruti Palaskar, Philip Arthur, Francesco Ciannella, Mingxing Du, Elin Larsen, Danny Merkx, Rachid Riad, Liming Wang, Emmanuel Dupoux, 
Linguistic Unit Discovery from Multi-Modal Inputs in Unwritten Languages: Summary of the "Speaking Rosetta" JSALT 2017 Workshop.
ICASSP2018
Neil Zeghidour, Nicolas Usunier, Iasonas Kokkinos, Thomas Schatz, Gabriel Synnaeve, Emmanuel Dupoux, 
Learning Filterbanks from Raw Speech for Phone Recognition.
Interspeech2018
Nils Holzenberger, Mingxing Du, Julien Karadayi, Rachid Riad, Emmanuel Dupoux, 
Learning Word Embeddings: Unsupervised Methods for Fixed-size Representations of Variable-length Speech Segments.
Interspeech2018
Rachid Riad, Corentin Dancette, Julien Karadayi, Neil Zeghidour, Thomas Schatz, Emmanuel Dupoux, 
Sampling Strategies in Siamese Networks for Unsupervised Speech Representation Learning.
Interspeech2018
Neil Zeghidour, Nicolas Usunier, Gabriel Synnaeve, Ronan Collobert, Emmanuel Dupoux, 
End-to-End Speech Recognition from the Raw Waveform.
ICASSP2022
Yixuan Zhang, Zhuo Chen 0006, Jian Wu 0027, Takuya Yoshioka, Peidong Wang, Zhong Meng, Jinyu Li 0001, 
Continuous Speech Separation with Recurrent Selective Attention Network.
Interspeech2022
Sanyuan Chen, Yu Wu 0012, Chengyi Wang 0002, Shujie Liu 0001, Zhuo Chen 0006, Peidong Wang, Gang Liu, Jinyu Li 0001, Jian Wu 0027, Xiangzhan Yu, Furu Wei, 
Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?
Interspeech2022
Naoyuki Kanda, Jian Wu 0027, Yu Wu 0012, Xiong Xiao, Zhong Meng, Xiaofei Wang 0009, Yashesh Gaur, Zhuo Chen 0006, Jinyu Li 0001, Takuya Yoshioka, 
Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings.
Interspeech2022
Naoyuki Kanda, Jian Wu 0027, Yu Wu 0012, Xiong Xiao, Zhong Meng, Xiaofei Wang 0009, Yashesh Gaur, Zhuo Chen 0006, Jinyu Li 0001, Takuya Yoshioka, 
Streaming Multi-Talker ASR with Token-Level Serialized Output Training.
ICASSP2021
Sanyuan Chen, Yu Wu 0012, Zhuo Chen 0006, Jian Wu 0027, Jinyu Li 0001, Takuya Yoshioka, Chengyi Wang 0002, Shujie Liu 0001, Ming Zhou 0001, 
Continuous Speech Separation with Conformer.
ICASSP2021
Amit Das, Kshitiz Kumar, Jian Wu 0027, 
Multi-Dialect Speech Recognition in English Using Attention on Ensemble of Experts.
ICASSP2021
Xiong Xiao, Naoyuki Kanda, Zhuo Chen 0006, Tianyan Zhou, Takuya Yoshioka, Sanyuan Chen, Yong Zhao 0008, Gang Liu, Yu Wu 0012, Jian Wu 0027, Shujie Liu 0001, Jinyu Li 0001, Yifan Gong 0001, 
Microsoft Speaker Diarization System for the Voxceleb Speaker Recognition Challenge 2020.
Interspeech2021
Amber Afshan, Kshitiz Kumar, Jian Wu 0027, 
Sequence-Level Confidence Classifier for ASR Utterance Accuracy and Application to Acoustic Models.
Interspeech2021
Sanyuan Chen, Yu Wu 0012, Zhuo Chen 0006, Jian Wu 0027, Takuya Yoshioka, Shujie Liu 0001, Jinyu Li 0001, Xiangzhan Yu, 
Ultra Fast Speech Separation Model with Teacher Student Learning.
Interspeech2021
Yihui Fu, Luyao Cheng, Shubo Lv, Yukai Jv, Yuxiang Kong, Zhuo Chen 0006, Yanxin Hu, Lei Xie 0001, Jian Wu 0027, Hui Bu, Xin Xu, Jun Du, Jingdong Chen, 
AISHELL-4: An Open Source Dataset for Speech Enhancement, Separation, Recognition and Speaker Diarization in Conference Scenario.
Interspeech2021
Jian Wu 0027, Zhuo Chen 0006, Sanyuan Chen, Yu Wu 0012, Takuya Yoshioka, Naoyuki Kanda, Shujie Liu 0001, Jinyu Li 0001, 
Investigation of Practical Aspects of Single Channel Speech Separation for ASR.
ICASSP2020
Zhuo Chen 0006, Takuya Yoshioka, Liang Lu 0001, Tianyan Zhou, Zhong Meng, Yi Luo 0004, Jian Wu 0027, Xiong Xiao, Jinyu Li 0001, 
Continuous Speech Separation: Dataset and Analysis.
ICASSP2020
Eva Sharma, Guoli Ye, Wenning Wei, Rui Zhao 0017, Yao Tian, Jian Wu 0027, Lei He 0005, Ed Lin, Yifan Gong 0001, 
Adaptation of RNN Transducer with Text-To-Speech Technology for Keyword Spotting.
ICASSP2020
Jixuan Wang, Xiong Xiao, Jian Wu 0027, Ranjani Ramamurthy, Frank Rudzicz, Michael Brudno, 
Speaker Diarization with Session-Level Speaker Embedding Refinement Using Graph Neural Networks.
ICASSP2020
Jianwei Yu, Shi-Xiong Zhang, Jian Wu 0027, Shahram Ghorbani, Bo Wu, Shiyin Kang, Shansong Liu, Xunying Liu, Helen Meng, Dong Yu 0001, 
Audio-Visual Recognition of Overlapped Speech for the LRS2 Dataset.
ICASSP2020
Yong Zhao 0008, Tianyan Zhou, Zhuo Chen 0006, Jian Wu 0027, 
Improving Deep CNN Networks with Long Temporal Context for Text-Independent Speaker Verification.
Interspeech2020
Yanxin Hu, Yun Liu, Shubo Lv, Mengtao Xing, Shimin Zhang, Yihui Fu, Jian Wu 0027, Bihong Zhang, Lei Xie 0001, 
DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement.
Interspeech2020
Kshitiz Kumar, Chaojun Liu, Yifan Gong 0001, Jian Wu 0027, 
1-D Row-Convolution LSTM: Fast Streaming ASR at Accuracy Parity with LC-BLSTM.
Interspeech2020
Kshitiz Kumar, Bo Ren, Yifan Gong 0001, Jian Wu 0027, 
Bandpass Noise Generation and Augmentation for Unified ASR.
Interspeech2020
Kshitiz Kumar, Emilian Stoimenov, Hosam Khalil, Jian Wu 0027, 
Fast and Slow Acoustic Model.
ICASSP2022
Shijing Si, Jianzong Wang, Junqing Peng, Jing Xiao 0006, 
Towards Speaker Age Estimation With Label Distribution Learning.
ICASSP2022
Huaizhen Tang, Xulong Zhang 0001, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006, 
Avqvc: One-Shot Voice Conversion By Vector Quantization With Applying Contrastive Learning.
ICASSP2022
Qiqi Wang 0005, Xulong Zhang 0001, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006, 
DRVC: A Framework of Any-to-Any Voice Conversion with Self-Supervised Learning.
ICASSP2022
Tong Ye, Shijing Si, Jianzong Wang, Rui Wang, Ning Cheng 0001, Jing Xiao 0006, 
VU-BERT: A Unified Framework for Visual Dialog.
ICASSP2022
Yong Zhang, Zhitao Li, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006, 
Self-Attention for Incomplete Utterance Rewriting.
ICASSP2022
Chendong Zhao, Jianzong Wang, Xiaoyang Qu, Haoqian Wang, Jing Xiao 0006, 
r-G2P: Evaluating and Enhancing Robustness of Grapheme to Phoneme Conversion by Controlled Noise Introducing and Contextual Information Incorporation.
ICASSP2022
Botao Zhao, Xulong Zhang 0001, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006, 
nnSpeech: Speaker-Guided Conditional Variational Autoencoder for Zero-Shot Multi-speaker text-to-speech.
Interspeech2022
Zuheng Kang, Junqing Peng, Jianzong Wang, Jing Xiao 0006, 
SpeechEQ: Speech Emotion Recognition based on Multi-scale Unified Datasets and Multitask Learning.
Interspeech2022
Jian Luo, Jianzong Wang, Ning Cheng 0001, Edward Xiao, Xulong Zhang 0001, Jing Xiao 0006, 
Tiny-Sepformer: A Tiny Time-Domain Transformer Network For Speech Separation.
Interspeech2022
Sicheng Yang, Methawee Tantrawenith, Haolin Zhuang, Zhiyong Wu 0001, Aolan Sun, Jianzong Wang, Ning Cheng 0001, Huaizhen Tang, Xintao Zhao, Jie Wang, Helen Meng, 
Speech Representation Disentanglement with Adversarial Mutual Information Learning for One-shot Voice Conversion.
Interspeech2022
Tong Ye, Shijing Si, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006, 
Uncertainty Calibration for Deep Audio Classifiers.
ICASSP2021
Yanfei Hui, Jianzong Wang, Ning Cheng 0001, Fengying Yu, Tianbo Wu, Jing Xiao 0006, 
Joint Intent Detection and Slot Filling Based on Continual Learning Model.
ICASSP2021
Jian Luo, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006, 
Unidirectional Memory-Self-Attention Transducer for Online Speech Recognition.
ICASSP2021
Zhen Zeng, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006, 
LVCNet: Efficient Condition-Dependent Modeling Network for Waveform Generation.
Interspeech2021
Zhenhou Hong, Jianzong Wang, Xiaoyang Qu, Jie Liu, Chendong Zhao, Jing Xiao 0006, 
Federated Learning with Dynamic Transformer for Text to Speech.
Interspeech2021
Jian Luo, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006, 
Dropout Regularization for Self-Supervised Learning of Transformer Encoder Speech Representation.
Interspeech2021
Junyi Peng, Xiaoyang Qu, Rongzhi Gu, Jianzong Wang, Jing Xiao 0006, Lukás Burget, Jan Cernocký, 
Effective Phase Encoding for End-To-End Speaker Verification.
Interspeech2021
Junyi Peng, Xiaoyang Qu, Jianzong Wang, Rongzhi Gu, Jing Xiao 0006, Lukás Burget, Jan Cernocký, 
ICSpk: Interpretable Complex Speaker Embedding Extractor from Raw Waveform.
Interspeech2021
Shijing Si, Jianzong Wang, Xiaoyang Qu, Ning Cheng 0001, Wenqi Wei, Xinghua Zhu, Jing Xiao 0006, 
Speech2Video: Cross-Modal Distillation for Speech to Video Generation.
Interspeech2021
Shijing Si, Jianzong Wang, Huiming Sun, Jianhan Wu, Chuanyao Zhang, Xiaoyang Qu, Ning Cheng 0001, Lei Chen, Jing Xiao 0006, 
Variational Information Bottleneck for Effective Low-Resource Audio Classification.
ICASSP2022
Zhengyang Chen, Sanyuan Chen, Yu Wu 0012, Yao Qian, Chengyi Wang 0002, Shujie Liu 0001, Yanmin Qian, Michael Zeng 0001, 
Large-Scale Self-Supervised Speech Representation Learning for Automatic Speaker Verification.
ICASSP2022
Yiming Wang, Jinyu Li 0001, Heming Wang, Yao Qian, Chengyi Wang 0002, Yu Wu 0012, 
Wav2vec-Switch: Contrastive Learning from Original-Noisy Speech Pairs for Robust Speech Recognition.
ICASSP2022
Chengyi Wang 0002, Yu Wu 0012, Sanyuan Chen, Shujie Liu 0001, Jinyu Li 0001, Yao Qian, Zhenglu Yang, 
Improving Self-Supervised Learning for Speech Recognition with Intermediate Layer Supervision.
Interspeech2022
Sanyuan Chen, Yu Wu 0012, Chengyi Wang 0002, Shujie Liu 0001, Zhuo Chen 0006, Peidong Wang, Gang Liu, Jinyu Li 0001, Jian Wu 0027, Xiangzhan Yu, Furu Wei, 
Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?
Interspeech2022
Naoyuki Kanda, Jian Wu 0027, Yu Wu 0012, Xiong Xiao, Zhong Meng, Xiaofei Wang 0009, Yashesh Gaur, Zhuo Chen 0006, Jinyu Li 0001, Takuya Yoshioka, 
Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings.
Interspeech2022
Naoyuki Kanda, Jian Wu 0027, Yu Wu 0012, Xiong Xiao, Zhong Meng, Xiaofei Wang 0009, Yashesh Gaur, Zhuo Chen 0006, Jinyu Li 0001, Takuya Yoshioka, 
Streaming Multi-Talker ASR with Token-Level Serialized Output Training.
Interspeech2022
Zhong Meng, Yashesh Gaur, Naoyuki Kanda, Jinyu Li 0001, Xie Chen 0001, Yu Wu 0012, Yifan Gong 0001, 
Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition.
Interspeech2022
Shuo Ren, Shujie Liu 0001, Yu Wu 0012, Long Zhou, Furu Wei, 
Speech Pre-training with Acoustic Piece.
Interspeech2022
Chengyi Wang 0002, Yiming Wang, Yu Wu 0012, Sanyuan Chen, Jinyu Li 0001, Shujie Liu 0001, Furu Wei, 
Supervision-Guided Codebooks for Masked Prediction in Speech Pre-training.
ACL2022
Junyi Ao, Rui Wang, Long Zhou, Chengyi Wang 0002, Shuo Ren, Yu Wu 0012, Shujie Liu 0001, Tom Ko, Qing Li, Yu Zhang 0006, Zhihua Wei, Yao Qian, Jinyu Li 0001, Furu Wei, 
SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing.
ICASSP2021
Sanyuan Chen, Yu Wu 0012, Zhuo Chen 0006, Takuya Yoshioka, Shujie Liu 0001, Jin-Yu Li 0001, Xiangzhan Yu, 
Don't Shoot Butterfly with Rifles: Multi-Channel Continuous Speech Separation with Early Exit Transformer.
ICASSP2021
Xie Chen 0001, Yu Wu 0012, Zhenghao Wang, Shujie Liu 0001, Jinyu Li 0001, 
Developing Real-Time Streaming Transformer Transducer for Speech Recognition on Large-Scale Dataset.
ICASSP2021
Sanyuan Chen, Yu Wu 0012, Zhuo Chen 0006, Jian Wu 0027, Jinyu Li 0001, Takuya Yoshioka, Chengyi Wang 0002, Shujie Liu 0001, Ming Zhou 0001, 
Continuous Speech Separation with Conformer.
ICASSP2021
Xiong Xiao, Naoyuki Kanda, Zhuo Chen 0006, Tianyan Zhou, Takuya Yoshioka, Sanyuan Chen, Yong Zhao 0008, Gang Liu, Yu Wu 0012, Jian Wu 0027, Shujie Liu 0001, Jinyu Li 0001, Yifan Gong 0001, 
Microsoft Speaker Diarization System for the Voxceleb Speaker Recognition Challenge 2020.
Interspeech2021
Sanyuan Chen, Yu Wu 0012, Zhuo Chen 0006, Jian Wu 0027, Takuya Yoshioka, Shujie Liu 0001, Jinyu Li 0001, Xiangzhan Yu, 
Ultra Fast Speech Separation Model with Teacher Student Learning.
Interspeech2021
Naoyuki Kanda, Guoli Ye, Yu Wu 0012, Yashesh Gaur, Xiaofei Wang 0009, Zhong Meng, Zhuo Chen 0006, Takuya Yoshioka, 
Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone.
Interspeech2021
Zhong Meng, Yu Wu 0012, Naoyuki Kanda, Liang Lu 0001, Xie Chen 0001, Guoli Ye, Eric Sun, Jinyu Li 0001, Yifan Gong 0001, 
Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition.
Interspeech2021
Eric Sun, Jinyu Li 0001, Zhong Meng, Yu Wu 0012, Jian Xue, Shujie Liu 0001, Yifan Gong 0001, 
Improving Multilingual Transformer Transducer Models by Reducing Language Confusions.
Interspeech2021
Jian Wu 0027, Zhuo Chen 0006, Sanyuan Chen, Yu Wu 0012, Takuya Yoshioka, Naoyuki Kanda, Shujie Liu 0001, Jinyu Li 0001, 
Investigation of Practical Aspects of Single Channel Speech Separation for ASR.
ICML2021
Chengyi Wang 0002, Yu Wu 0012, Yao Qian, Ken'ichi Kumatani, Shujie Liu 0001, Furu Wei, Michael Zeng 0001, Xuedong Huang 0001, 
UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data.
Interspeech2022
Tuan-Nam Nguyen, Ngoc-Quan Pham, Alexander Waibel, 
Accent Conversion using Pre-trained Model and Synthesized Data from Voice Conversion.
Interspeech2022
Ngoc-Quan Pham, Alexander Waibel, Jan Niehues, 
Adaptive multilingual speech recognition with pretrained models.
Interspeech2021
Thai-Son Nguyen, Sebastian Stüker, Alex Waibel, 
Super-Human Performance in Online Low-Latency Recognition of Conversational Speech.
Interspeech2021
Ngoc-Quan Pham, Tuan-Nam Nguyen, Sebastian Stüker, Alex Waibel, 
Efficient Weight Factorization for Multilingual Speech Recognition.
ICASSP2020
Thai-Son Nguyen, Sebastian Stüker, Jan Niehues, Alex Waibel, 
Improving Sequence-To-Sequence Speech Recognition Training with On-The-Fly Data Augmentation.
Interspeech2020
Thai-Son Nguyen, Ngoc-Quan Pham, Sebastian Stüker, Alex Waibel, 
High Performance Sequence-to-Sequence Model for Streaming Speech Recognition.
Interspeech2020
Ngoc-Quan Pham, Thanh-Le Ha, Tuan-Nam Nguyen, Thai-Son Nguyen, Elizabeth Salesky, Sebastian Stüker, Jan Niehues, Alex Waibel, 
Relative Positional Encoding for Speech Recognition and Direct Translation.
Interspeech2019
Ngoc-Quan Pham, Thai-Son Nguyen, Jan Niehues, Markus Müller 0001, Alex Waibel, 
Very Deep Self-Attention Networks for End-to-End Speech Recognition.
NAACL2019
Elizabeth Salesky, Matthias Sperber, Alexander Waibel, 
Fluent Translations from Disfluent Speech in End-to-End Speech Translation.
ICASSP2018
Markus Müller 0001, Sebastian Stüker, Alex Waibel, 
Multilingual Adaptation of RNN Based ASR Systems.
ICASSP2018
Thai-Son Nguyen, Sebastian Stiiker, Alex Waibel, 
Exploring Ctc-Network Derived Features with Conventional Hybrid System.
Interspeech2018
Markus Müller 0001, Sebastian Stüker, Alex Waibel, 
Neural Language Codes for Multilingual Acoustic Models.
Interspeech2018
Maren Kucza, Jan Niehues, Thomas Zenkel, Alex Waibel, Sebastian Stüker, 
Term Extraction via Neural Sequence Labeling a Comparative Evaluation of Strategies Using Recurrent Neural Networks.
Interspeech2018
Jan Niehues, Ngoc-Quan Pham, Thanh-Le Ha, Matthias Sperber, Alex Waibel, 
Low-Latency Neural Speech Translation.
Interspeech2018
Matthias Sperber, Jan Niehues, Graham Neubig, Sebastian Stüker, Alex Waibel, 
Self-Attentional Acoustic Models.
Interspeech2018
Thomas Zenkel, Ramon Sanabria, Florian Metze, Alex Waibel, 
Subword and Crossword Units for CTC Acoustic Models.
SpeechComm2017
Matthias Sperber, Graham Neubig, Jan Niehues, Satoshi Nakamura 0001, Alex Waibel, 
Transcribing against time.
ICASSP2017
Markus Müller 0001, Jörg Franke, Alex Waibel, Sebastian Stüker, 
Towards phoneme inventory discovery for documentation of unwritten languages.
Interspeech2017
Eunah Cho, Jan Niehues, Alex Waibel, 
NMT-Based Segmentation and Punctuation Insertion for Real-Time Spoken Language Translation.
Interspeech2017
Robin Ruede, Markus Müller 0001, Sebastian Stüker, Alex Waibel, 
Enhancing Backchannel Prediction Using Word Embeddings.
Interspeech2022
Wataru Nakata, Tomoki Koriyama, Shinnosuke Takamichi, Yuki Saito, Yusuke Ijima, Ryo Masumura, Hiroshi Saruwatari, 
Predicting VQVAE-based Character Acting Style from Quotation-Annotated Text for Audiobook Speech Synthesis.
Interspeech2022
Yuto Nishimura, Yuki Saito, Shinnosuke Takamichi, Kentaro Tachibana, Hiroshi Saruwatari, 
Acoustic Modeling for End-to-End Empathetic Dialogue Speech Synthesis Using Linguistic and Prosodic Contexts of Dialogue History.
Interspeech2022
Takaaki Saeki, Shinnosuke Takamichi, Tomohiko Nakamura, Naoko Tanji, Hiroshi Saruwatari, 
SelfRemaster: Self-Supervised Speech Restoration with Analysis-by-Synthesis Approach Using Channel Modeling.
Interspeech2022
Takaaki Saeki, Detai Xin, Wataru Nakata, Tomoki Koriyama, Shinnosuke Takamichi, Hiroshi Saruwatari, 
UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022.
Interspeech2022
Yuki Saito, Yuto Nishimura, Shinnosuke Takamichi, Kentaro Tachibana, Hiroshi Saruwatari, 
STUDIES: Corpus of Japanese Empathetic Dialogue Speech Towards Friendly Voice Agent.
Interspeech2022
Shinnosuke Takamichi, Wataru Nakata, Naoko Tanji, Hiroshi Saruwatari, 
J-MAC: Japanese multi-speaker audiobook corpus for speech synthesis.
SpeechComm2021
Masashi Aso, Shinnosuke Takamichi, Norihiro Takamune, Hiroshi Saruwatari, 
Acoustic model-based subword tokenization and prosodic-context extraction without language knowledge for text-to-speech synthesis.
TASLP2021
Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari, 
Perceptual-Similarity-Aware Deep Speaker Representation Learning for Multi-Speaker Generative Modeling.
ICASSP2021
Detai Xin, Tatsuya Komatsu, Shinnosuke Takamichi, Hiroshi Saruwatari, 
Disentangled Speaker and Language Representations Using Mutual Information Minimization and Domain Adaptation for Cross-Lingual TTS.
Interspeech2021
Detai Xin, Yuki Saito, Shinnosuke Takamichi, Tomoki Koriyama, Hiroshi Saruwatari, 
Cross-Lingual Speaker Adaptation Using Domain Adaptation and Speaker Consistency Loss for Text-To-Speech Synthesis.
ICASSP2020
Takaaki Saeki, Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari, 
Lifter Training and Sub-Band Modeling for Computationally Efficient and High-Quality Voice Conversion Using Spectral Differentials.
Interspeech2020
Masashi Aso, Shinnosuke Takamichi, Hiroshi Saruwatari, 
End-to-End Text-to-Speech Synthesis with Unaligned Multiple Language Units Based on Attention.
Interspeech2020
Takaaki Saeki, Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari, 
Real-Time, Full-Band, Online DNN-Based Voice Conversion System Using a Single CPU.
Interspeech2020
Detai Xin, Yuki Saito, Shinnosuke Takamichi, Tomoki Koriyama, Hiroshi Saruwatari, 
Cross-Lingual Text-To-Speech Synthesis via Domain Adaptation and Perceptual Similarity Regression in Speaker Space.
Interspeech2020
Yuki Yamashita, Tomoki Koriyama, Yuki Saito, Shinnosuke Takamichi, Yusuke Ijima, Ryo Masumura, Hiroshi Saruwatari, 
Investigating Effective Additional Contextual Factors in DNN-Based Spontaneous Speech Synthesis.
ICASSP2019
Hiroki Tamaru, Yuki Saito, Shinnosuke Takamichi, Tomoki Koriyama, Hiroshi Saruwatari, 
Generative Moment Matching Network-based Random Modulation Post-filter for DNN-based Singing Voice Synthesis and Neural Double-tracking.
Interspeech2019
Ivan Halim Parmonangan, Hiroki Tanaka, Sakriani Sakti, Shinnosuke Takamichi, Satoshi Nakamura 0001, 
Speech Quality Evaluation of Synthesized Japanese Speech Using EEG.
TASLP2018
Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari, 
Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks.
ICASSP2018
Yuki Saito, Yusuke Ijima, Kyosuke Nishida, Shinnosuke Takamichi, 
Non-Parallel Voice Conversion Using Variational Autoencoders Conditioned by Phonetic Posteriorgrams and D-Vectors.
ICASSP2018
Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari, 
Text-to-Speech Synthesis Using STFT Spectra Based on Low-/Multi-Resolution Generative Adversarial Networks.
ICASSP2022
Yadong Guan, Jiabin Xue, Guibin Zheng, Jiqing Han, 
Sparse Self-Attention for Semi-Supervised Sound Event Detection.
ICASSP2022
Jianchen Li, Jiqing Han, Hongwei Song, 
CDMA: Cross-Domain Distance Metric Adaptation for Speaker Verification.
Interspeech2022
Fan Qian, Hongwei Song, Jiqing Han, 
Word-wise Sparse Attention for Multimodal Sentiment Analysis.
Interspeech2021
Jianchen Li, Jiqing Han, Hongwei Song, 
Gradient Regularization for Noise-Robust Speaker Verification.
Interspeech2021
Fan Qian, Jiqing Han, 
Multimodal Sentiment Analysis with Temporal Modality Attention.
Interspeech2021
Jiabin Xue, Tieran Zheng, Jiqing Han, 
Model-Agnostic Fast Adaptive Multi-Objective Balancing Algorithm for Multilingual Automatic Speech Recognition Model Training.
TASLP2020
Zhihao Du, Xueliang Zhang 0001, Jiqing Han, 
A Joint Framework of Denoising Autoencoder and Generative Vocoder for Monaural Speech Enhancement.
TASLP2020
Hui Luo, Jiqing Han, 
Nonnegative Matrix Factorization Based Transfer Subspace Learning for Cross-Corpus Speech Emotion Recognition.
ICASSP2020
Chen Chen, Jiqing Han, 
TDMF: Task-Driven Multilevel Framework for End-to-End Speaker Verification.
ICASSP2020
Zhihao Du, Ming Lei, Jiqing Han, Shiliang Zhang, 
Pan: Phoneme-Aware Network for Monaural Speech Enhancement.
ICASSP2020
Jiabin Xue, Tieran Zheng, Jiqing Han, 
Structured Sparse Attention for end-to-end Automatic Speech Recognition.
Interspeech2020
Zhihao Du, Jiqing Han, Xueliang Zhang 0001, 
Double Adversarial Network Based Monaural Speech Enhancement for Robust Speech Recognition.
Interspeech2020
Zhihao Du, Ming Lei, Jiqing Han, Shiliang Zhang, 
Self-Supervised Adversarial Multi-Task Learning for Vocoder-Based Monaural Speech Enhancement.
Interspeech2020
Ziqiang Shi, Rujie Liu, Jiqing Han, 
Speech Separation Based on Multi-Stage Elaborated Dual-Path Deep BiLSTM with Auxiliary Identity Loss.
Interspeech2020
Liwen Zhang, Jiqing Han, Ziqiang Shi, 
ATReSN-Net: Capturing Attentive Temporal Relations in Semantic Neighborhood for Acoustic Scene Classification.
ICASSP2019
Ziqiang Shi, Huibin Lin, Liu Liu, Rujie Liu, Shoji Hayakawa, Jiqing Han, 
Furcax: End-to-end Monaural Speech Separation Based on Deep Gated (De)convolutional Neural Networks with Adversarial Example Training.
Interspeech2019
Hui Luo, Jiqing Han, 
Cross-Corpus Speech Emotion Recognition Using Semi-Supervised Transfer Non-Negative Matrix Factorization with Adaptation Regularization.
Interspeech2019
Qiuying Shi, Hui Luo, Jiqing Han, 
Subspace Pooling Based Temporal Features Extraction for Audio Event Recognition.
Interspeech2019
Ziqiang Shi, Huibin Lin, Liu Liu, Rujie Liu, Shoji Hayakawa, Shouji Harada, Jiqing Han, 
End-to-End Monaural Speech Separation with Multi-Scale Dynamic Weighted Gated Dilated Convolutional Pyramid Network.
Interspeech2019
Ziqiang Shi, Huibin Lin, Liu Liu, Rujie Liu, Jiqing Han, Anyan Shi, 
Deep Attention Gated Dilated Temporal Convolutional Networks with Intra-Parallel Convolutional Modules for End-to-End Monaural Speech Separation.
Interspeech2022
Kartik Audhkhasi, Yinghui Huang, Bhuvana Ramabhadran, Pedro J. Moreno 0001, 
Analysis of Self-Attention Head Diversity for Conformer-based Automatic Speech Recognition.
Interspeech2021
Kartik Audhkhasi, Tongzhou Chen, Bhuvana Ramabhadran, Pedro J. Moreno 0001, 
Mixture Model Attention: Flexible Streaming and Non-Streaming Automatic Speech Recognition.
Interspeech2021
Andrew Rouditchenko, Angie W. Boggust, David Harwath, Brian Chen, Dhiraj Joshi, Samuel Thomas 0001, Kartik Audhkhasi, Hilde Kuehne, Rameswar Panda, Rogério Schmidt Feris, Brian Kingsbury, Michael Picheny, Antonio Torralba 0001, James R. Glass, 
AVLnet: Learning Audio-Visual Language Representations from Instructional Videos.
Interspeech2021
Hainan Xu, Kartik Audhkhasi, Yinghui Huang, Jesse Emond, Bhuvana Ramabhadran, 
Regularizing Word Segmentation by Creating Misspellings.
ICASSP2020
Yinghui Huang, Hong-Kwang Kuo, Samuel Thomas 0001, Zvi Kons, Kartik Audhkhasi, Brian Kingsbury, Ron Hoory, Michael Picheny, 
Leveraging Unpaired Text Data for Training End-To-End Speech-to-Intent Systems.
Interspeech2020
Samuel Thomas 0001, Kartik Audhkhasi, Brian Kingsbury, 
Transliteration Based Data Augmentation for Training Multilingual ASR Acoustic Models in Low Resource Settings.
Interspeech2020
Hong-Kwang Jeff Kuo, Zoltán Tüske, Samuel Thomas 0001, Yinghui Huang, Kartik Audhkhasi, Brian Kingsbury, Gakuto Kurata, Zvi Kons, Ron Hoory, Luis A. Lastras, 
End-to-End Spoken Language Understanding Without Full Transcripts.
Interspeech2020
Zoltán Tüske, George Saon, Kartik Audhkhasi, Brian Kingsbury, 
Single Headed Attention Based Sequence-to-Sequence Model for State-of-the-Art Results on Switchboard.
ICASSP2019
George Saon, Zoltán Tüske, Kartik Audhkhasi, Brian Kingsbury, 
Sequence Noise Injected Training for End-to-end Speech Recognition.
ICASSP2019
Shane Settle, Kartik Audhkhasi, Karen Livescu, Michael Picheny, 
Acoustically Grounded Word Embeddings for Improved Acoustics-to-word Speech Recognition.
Interspeech2019
Kartik Audhkhasi, George Saon, Zoltán Tüske, Brian Kingsbury, Michael Picheny, 
Forget a Bit to Learn Better: Soft Forgetting for CTC-Based Automatic Speech Recognition.
Interspeech2019
Gakuto Kurata, Kartik Audhkhasi, 
Guiding CTC Posterior Spike Timings for Improved Posterior Fusion and Knowledge Distillation.
Interspeech2019
Gakuto Kurata, Kartik Audhkhasi, 
Multi-Task CTC Training with Auxiliary Feature Reconstruction for End-to-End Speech Recognition.
Interspeech2019
Michael Picheny, Zoltán Tüske, Brian Kingsbury, Kartik Audhkhasi, Xiaodong Cui, George Saon, 
Challenging the Boundaries of Speech Recognition: The MALACH Corpus.
Interspeech2019
Samuel Thomas 0001, Kartik Audhkhasi, Zoltán Tüske, Yinghui Huang, Michael Picheny, 
Detection and Recovery of OOVs for Improved English Broadcast News Captioning.
Interspeech2019
Zoltán Tüske, Kartik Audhkhasi, George Saon, 
Advancing Sequence-to-Sequence Based Speech Recognition.
ICASSP2018
Kartik Audhkhasi, Brian Kingsbury, Bhuvana Ramabhadran, George Saon, Michael Picheny, 
Building Competitive Direct Acoustics-to-Word Models for English Conversational Speech Recognition.
ICASSP2018
Xuesong Yang, Kartik Audhkhasi, Andrew Rosenberg, Samuel Thomas 0001, Bhuvana Ramabhadran, Mark Hasegawa-Johnson, 
Joint Modeling of Accents and Acoustics for Multi-Accent Speech Recognition.
ICASSP2017
Kartik Audhkhasi, Andrew Rosenberg, Abhinav Sethy, Bhuvana Ramabhadran, Brian Kingsbury, 
End-to-end ASR-free keyword search from speech.
ICASSP2017
Andrew Rosenberg, Kartik Audhkhasi, Abhinav Sethy, Bhuvana Ramabhadran, Michael Picheny, 
End-to-end speech recognition and keyword search on low-resource languages.
ICASSP2022
Zhong-Qiu Wang, DeLiang Wang, 
Localization based Sequential Grouping for Continuous Speech Separation.
Interspeech2022
Yen-Ju Lu, Xuankai Chang, Chenda Li, Wangyou Zhang, Samuele Cornell, Zhaoheng Ni, Yoshiki Masuyama, Brian Yan, Robin Scheibler, Zhong-Qiu Wang, Yu Tsao 0001, Yanmin Qian, Shinji Watanabe 0001, 
ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding.
TASLP2021
Zhong-Qiu Wang, Gordon Wichern, Jonathan Le Roux, 
Convolutive Prediction for Monaural Speech Dereverberation and Noisy-Reverberant Speaker Separation.
ICASSP2021
Zhong-Qiu Wang, DeLiang Wang, 
Count And Separate: Incorporating Speaker Counting For Continuous Speaker Separation.
TASLP2020
Hassan Taherian, Zhong-Qiu Wang, Jorge Chang, DeLiang Wang, 
Robust Speaker Recognition Based on Single-Channel and Multi-Channel Speech Enhancement.
TASLP2020
Zhong-Qiu Wang, DeLiang Wang, 
Deep Learning Based Target Cancellation for Speech Dereverberation.
TASLP2019
Zhong-Qiu Wang, DeLiang Wang, 
Combining Spectral and Spatial Features for Deep Learning Based Blind Speaker Separation.
TASLP2019
Zhong-Qiu Wang, Xueliang Zhang 0001, DeLiang Wang, 
Robust Speaker Localization Guided by Deep Learning-Based Time-Frequency Masking.
TASLP2019
Yan Zhao 0010, Zhong-Qiu Wang, DeLiang Wang, 
Two-Stage Deep Learning for Noisy-Reverberant Speech Enhancement.
ICASSP2019
Zhong-Qiu Wang, Ke Tan 0001, DeLiang Wang, 
Deep Learning Based Phase Reconstruction for Speaker Separation: A Trigonometric Perspective.
Interspeech2019
Hassan Taherian, Zhong-Qiu Wang, DeLiang Wang, 
Deep Learning Based Multi-Channel Speaker Recognition in Noisy and Reverberant Environments.
ICASSP2018
Zhong-Qiu Wang, Jonathan Le Roux, John R. Hershey, 
Multi-Channel Deep Clustering: Discriminative Spectral and Spatial Embeddings for Speaker-Independent Speech Separation.
ICASSP2018
Zhong-Qiu Wang, Jonathan Le Roux, John R. Hershey, 
Alternative Objective Functions for Deep Clustering.
ICASSP2018
Zhong-Qiu Wang, DeLiang Wang, 
Mask Weighted Stft Ratios for Relative Transfer Function Estimation and ITS Application to Robust ASR.
Interspeech2018
Zhong-Qiu Wang, Jonathan Le Roux, DeLiang Wang, John R. Hershey, 
End-to-End Speech Separation with Unfolded Iterative Phase Reconstruction.
Interspeech2018
Zhong-Qiu Wang, DeLiang Wang, 
Integrating Spectral and Spatial Features for Multi-Channel Speaker Separation.
Interspeech2018
Zhong-Qiu Wang, DeLiang Wang, 
All-Neural Multi-Channel Speech Enhancement.
Interspeech2018
Zhong-Qiu Wang, Xueliang Zhang 0001, DeLiang Wang, 
Robust TDOA Estimation Based on Time-Frequency Masking and Deep Neural Networks.
ICASSP2017
Zhong-Qiu Wang, Ivan Tashev, 
Learning utterance-level representations for speech emotion and age/gender recognition using deep neural networks.
ICASSP2017
Zhong-Qiu Wang, DeLiang Wang, 
Recurrent deep stacking networks for supervised speech separation.
TASLP2021
Yi Luo 0004, Cong Han, Nima Mesgarani, 
Group Communication With Context Codec for Lightweight Source Separation.
ICASSP2021
Chenxing Li, Jiaming Xu 0001, Nima Mesgarani, Bo Xu 0002, 
Speaker and Direction Inferred Dual-Channel Speech Separation.
ICASSP2021
Yi Luo, Zhuo Chen, Cong Han, Chenda Li, Tianyan Zhou, Nima Mesgarani, 
Rethinking The Separation Layers In Speech Separation Networks.
ICASSP2021
Yi Luo, Cong Han, Nima Mesgarani, 
Ultra-Lightweight Speech Separation Via Group Communication.
Interspeech2021
Cong Han, Yi Luo 0004, Chenda Li, Tianyan Zhou, Keisuke Kinoshita, Shinji Watanabe 0001, Marc Delcroix, Hakan Erdogan, John R. Hershey, Nima Mesgarani, Zhuo Chen 0006, 
Continuous Speech Separation Using Speaker Inventory for Long Recording.
Interspeech2021
Cong Han, Yi Luo, Nima Mesgarani, 
Binaural Speech Separation of Moving Speakers With Preserved Spatial Cues.
Interspeech2021
Yinghao Aaron Li, Ali Zare, Nima Mesgarani, 
StarGANv2-VC: A Diverse, Unsupervised, Non-Parallel Framework for Natural-Sounding Voice Conversion.
Interspeech2021
Yi Luo, Cong Han, Nima Mesgarani, 
Empirical Analysis of Generalized Iterative Speech Separation Networks.
Interspeech2021
Yi Luo, Nima Mesgarani, 
Implicit Filter-and-Sum Network for End-to-End Multi-Channel Speech Separation.
NeurIPS2021
Menoua Keshishian, Samuel Norman-Haignere, Nima Mesgarani, 
Understanding Adaptive, Multiscale Temporal Integration In Deep Speech Recognition Systems.
ICASSP2020
Cong Han, Yi Luo 0004, Nima Mesgarani, 
Real-Time Binaural Speech Separation with Preserved Spatial Cues.
Interspeech2020
Yi Luo 0004, Nima Mesgarani, 
Separating Varying Numbers of Sources with Auxiliary Autoencoding Loss.
TASLP2019
Yi Luo 0004, Nima Mesgarani, 
Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation.
TASLP2018
Yi Luo 0004, Zhuo Chen 0006, Nima Mesgarani, 
Speaker-Independent Speech Separation With Deep Attractor Network.
ICASSP2018
Yi Luo 0004, Nima Mesgarani, 
TaSNet: Time-Domain Audio Separation Network for Real-Time, Single-Channel Speech Separation.
ICASSP2018
Hassan Akbari, Himani Arora, Liangliang Cao, Nima Mesgarani, 
Lip2Audspec: Speech Reconstruction from Silent Lip Movements Video.
Interspeech2018
Yi Luo 0004, Nima Mesgarani, 
Real-time Single-channel Dereverberation and Separation with Time-domain Audio Separation Network.
Interspeech2018
Rajath Kumar, Yi Luo 0004, Nima Mesgarani, 
Music Source Activity Detection and Separation Using Deep Attractor Network.
Interspeech2018
Nima Mesgarani, 
Speech Processing in the Human Brain Meets Deep Learning.
ICASSP2017
Bahar Khalighinejad, Tasha Nagamine, Ashesh D. Mehta, Nima Mesgarani, 
NAPLib: An open source toolbox for real-time and offline Neural Acoustic Processing.
ICASSP2022
Takafumi Moriya, Takanori Ashihara, Atsushi Ando, Hiroshi Sato, Tomohiro Tanaka, Kohei Matsuura, Ryo Masumura, Marc Delcroix, Takahiro Shinozaki, 
Hybrid RNN-T/Attention-Based Streaming ASR with Triggered Chunkwise Attention and Dual Internal Language Model Integration.
ICASSP2022
Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Naoyuki Kamo, Takafumi Moriya, 
Learning to Enhance or Not: Neural Network-Based Switching of Enhanced and Observed Signals for Overlapping Speech Recognition.
Interspeech2022
Takanori Ashihara, Takafumi Moriya, Kohei Matsuura, Tomohiro Tanaka, 
Deep versus Wide: An Analysis of Student Architectures for Task-Agnostic Knowledge Distillation of Self-Supervised Speech Models.
Interspeech2022
Ryo Masumura, Yoshihiro Yamazaki, Saki Mizuno, Naoki Makishima, Mana Ihori, Mihiro Uchida, Hiroshi Sato, Tomohiro Tanaka, Akihiko Takashima, Satoshi Suzuki, Shota Orihashi, Takafumi Moriya, Nobukatsu Hojo, Atsushi Ando, 
End-to-End Joint Modeling of Conversation History-Dependent and Independent ASR Systems with Multi-History Training.
Interspeech2022
Takafumi Moriya, Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Takahiro Shinozaki, 
Streaming Target-Speaker ASR with Neural Transducer.
Interspeech2022
Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Takafumi Moriya, Naoki Makishima, Mana Ihori, Tomohiro Tanaka, Ryo Masumura, 
Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations.
Interspeech2022
Tomohiro Tanaka, Ryo Masumura, Hiroshi Sato, Mana Ihori, Kohei Matsuura, Takanori Ashihara, Takafumi Moriya, 
Domain Adversarial Self-Supervised Speech Representation Learning for Improving Unknown Domain Downstream Tasks.
ICASSP2021
Atsushi Ando, Ryo Masumura, Hiroshi Sato, Takafumi Moriya, Takanori Ashihara, Yusuke Ijima, Tomoki Toda, 
Speech Emotion Recognition Based on Listener Adaptive Models.
ICASSP2021
Takafumi Moriya, Takanori Ashihara, Tomohiro Tanaka, Tsubasa Ochiai, Hiroshi Sato, Atsushi Ando, Yusuke Ijima, Ryo Masumura, Yusuke Shinohara, 
Simpleflat: A Simple Whole-Network Pre-Training Approach for RNN Transducer-Based End-to-End Speech Recognition.
Interspeech2021
Takanori Ashihara, Takafumi Moriya, Makio Kashino, 
Investigating the Impact of Spectral and Temporal Degradation on End-to-End Automatic Speech Recognition Performance.
Interspeech2021
Takafumi Moriya, Tomohiro Tanaka, Takanori Ashihara, Tsubasa Ochiai, Hiroshi Sato, Atsushi Ando, Ryo Masumura, Marc Delcroix, Taichi Asami, 
Streaming End-to-End Speech Recognition for Hybrid RNN-T/Attention Architecture.
Interspeech2021
Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Takafumi Moriya, Naoyuki Kamo, 
Should We Always Separate?: Switching Between Enhanced and Observed Signals for Overlapping Speech Recognition.
Interspeech2021
Tomohiro Tanaka, Ryo Masumura, Mana Ihori, Akihiko Takashima, Takafumi Moriya, Takanori Ashihara, Shota Orihashi, Naoki Makishima, 
Cross-Modal Transformer-Based Neural Correction Models for Automatic Speech Recognition.
ICASSP2020
Ryo Masumura, Mana Ihori, Akihiko Takashima, Takafumi Moriya, Atsushi Ando, Yusuke Shinohara, 
Sequence-Level Consistency Training for Semi-Supervised End-to-End Automatic Speech Recognition.
ICASSP2020
Takafumi Moriya, Hiroshi Sato, Tomohiro Tanaka, Takanori Ashihara, Ryo Masumura, Yusuke Shinohara, 
Distilling Attention Weights for CTC-Based ASR Systems.
Interspeech2020
Takafumi Moriya, Tsubasa Ochiai, Shigeki Karita, Hiroshi Sato, Tomohiro Tanaka, Takanori Ashihara, Ryo Masumura, Yusuke Shinohara, Marc Delcroix, 
Self-Distillation for Improving CTC-Transformer-Based ASR Systems.
TASLP2019
Takafumi Moriya, Tomohiro Tanaka, Takahiro Shinozaki, Shinji Watanabe 0001, Kevin Duh, 
Evolution-Strategy-Based Automation of System Development for High-Performance Speech Recognition.
ICASSP2019
Ryo Masumura, Tomohiro Tanaka, Takafumi Moriya, Yusuke Shinohara, Takanobu Oba, Yushi Aono, 
Large Context End-to-end Automatic Speech Recognition via Extension of Hierarchical Recurrent Encoder-decoder Models.
Interspeech2019
Takanori Ashihara, Yusuke Shinohara, Hiroshi Sato, Takafumi Moriya, Kiyoaki Matsui, Takaaki Fukutomi, Yoshikazu Yamaguchi, Yushi Aono, 
Neural Whispered Speech Detection with Imbalanced Learning.
Interspeech2019
Ryo Masumura, Hiroshi Sato, Tomohiro Tanaka, Takafumi Moriya, Yusuke Ijima, Takanobu Oba, 
End-to-End Automatic Speech Recognition with a Reconstruction Criterion Using Speech-to-Text and Text-to-Speech Encoder-Decoders.
Interspeech2022
Bei Liu, Zhengyang Chen, Shuai Wang 0016, Haoyu Wang, Bing Han, Yanmin Qian, 
DF-ResNet: Boosting Speaker Verification Performance with Depth-First Design.
TASLP2021
Heinrich Dinkel, Shuai Wang 0016, Xuenan Xu, Mengyue Wu, Kai Yu 0004, 
Voice Activity Detection in the Wild: A Data-Driven Approach Using Teacher-Student Training.
TASLP2021
Yanmin Qian, Zhengyang Chen, Shuai Wang 0016, 
Audio-Visual Deep Neural Network for Robust Person Verification.
ICASSP2021
Zhengyang Chen, Shuai Wang 0016, Yanmin Qian, 
Self-Supervised Learning Based Domain Adaptation for Robust Speaker Verification.
ICASSP2021
Chenpeng Du, Bing Han, Shuai Wang 0016, Yanmin Qian, Kai Yu 0004, 
SynAug: Synthesis-Based Data Augmentation for Text-Dependent Speaker Verification.
ICASSP2021
Houjun Huang, Xu Xiang, Fei Zhao, Shuai Wang 0016, Yanmin Qian, 
Unit Selection Synthesis Based Data Augmentation for Fixed Phrase Speaker Verification.
TASLP2020
Shuai Wang 0016, Yexin Yang, Zhanghao Wu, Yanmin Qian, Kai Yu 0004, 
Data Augmentation Using Deep Generative Models for Embedding Based Speaker Recognition.
ICASSP2020
Shuai Wang 0016, Johan Rohdin, Oldrich Plchot, Lukás Burget, Kai Yu 0004, Jan Cernocký, 
Investigation of Specaugment for Deep Speaker Embedding Learning.
ICASSP2020
Zhengyang Chen, Shuai Wang 0016, Yanmin Qian, Kai Yu 0004, 
Channel Invariant Speaker Embedding Learning with Joint Multi-Task and Adversarial Training.
ICASSP2020
Mireia Díez, Lukás Burget, Federico Landini, Shuai Wang 0016, Honza Cernocký, 
Optimizing Bayesian Hmm Based X-Vector Clustering for the Second Dihard Speech Diarization Challenge.
ICASSP2020
Federico Landini, Shuai Wang 0016, Mireia Díez, Lukás Burget, Pavel Matejka, Katerina Zmolíková, Ladislav Mosner, Anna Silnova, Oldrich Plchot, Ondrej Novotný, Hossein Zeinali, Johan Rohdin, 
But System for the Second Dihard Speech Diarization Challenge.
Interspeech2020
Zhengyang Chen, Shuai Wang 0016, Yanmin Qian, 
Multi-Modality Matters: A Performance Leap on VoxCeleb.
Interspeech2020
Zhengyang Chen, Shuai Wang 0016, Yanmin Qian, 
Adversarial Domain Adaptation for Speaker Verification Using Partially Shared Network.
Interspeech2020
Hongji Wang, Heinrich Dinkel, Shuai Wang 0016, Yanmin Qian, Kai Yu 0004, 
Dual-Adversarial Domain Adaptation for Generalized Replay Attack Detection.
TASLP2019
Shuai Wang 0016, Zili Huang, Yanmin Qian, Kai Yu 0004, 
Discriminative Neural Embedding Learning for Short-Duration Text-Independent Speaker Verification.
ICASSP2019
Shuai Wang 0016, Yexin Yang, Tianzhe Wang, Yanmin Qian, Kai Yu 0004, 
Knowledge Distillation for Small Foot-print Deep Speaker Embedding.
Interspeech2019
Mireia Díez, Lukás Burget, Shuai Wang 0016, Johan Rohdin, Jan Cernocký, 
Bayesian HMM Based x-Vector Clustering for Speaker Diarization.
Interspeech2019
Hongji Wang, Heinrich Dinkel, Shuai Wang 0016, Yanmin Qian, Kai Yu 0004, 
Cross-Domain Replay Spoofing Attack Detection Using Domain Adversarial Training.
Interspeech2019
Shuai Wang 0016, Johan Rohdin, Lukás Burget, Oldrich Plchot, Yanmin Qian, Kai Yu 0004, Jan Cernocký, 
On the Usage of Phonetic Information for Text-Independent Speaker Embedding Extraction.
Interspeech2019
Zhanghao Wu, Shuai Wang 0016, Yanmin Qian, Kai Yu 0004, 
Data Augmentation Using Variational Autoencoder for Embedding Based Speaker Verification.
ICASSP2022
Cong Cai, Bin Liu 0041, Jianhua Tao, Zhengkun Tian, Jiahao Lu, Kexin Wang, 
End-to-End Network Based on Transformer for Automatic Detection of Covid-19.
TASLP2021
Cunhang Fan, Jiangyan Yi, Jianhua Tao, Zhengkun Tian, Bin Liu 0041, Zhengqi Wen, 
Gated Recurrent Fusion With Joint Training Framework for Robust End-to-End Speech Recognition.
TASLP2021
Yongwei Li, Jianhua Tao, Donna Erickson, Bin Liu 0041, Masato Akagi, 
$F_0$-Noise-Robust Glottal Source and Vocal Tract Analysis Based on ARX-LF Model.
TASLP2021
Zheng Lian, Bin Liu 0041, Jianhua Tao, 
CTNet: Conversational Transformer Network for Emotion Recognition.
ICASSP2021
Licai Sun, Bin Liu 0041, Jianhua Tao, Zheng Lian, 
Multimodal Cross- and Self-Attention Network for Speech Emotion Recognition.
Interspeech2021
Cong Cai, Mingyue Niu, Bin Liu 0041, Jianhua Tao, Xuefei Liu, 
TDCA-Net: Time-Domain Channel Attention Network for Depression Detection.
ICASSP2020
Jian Huang 0014, Jianhua Tao, Bin Liu 0041, Zheng Lian, Mingyue Niu, 
Multimodal Transformer Fusion for Continuous Emotion Recognition.
Interspeech2020
Cunhang Fan, Jianhua Tao, Bin Liu 0041, Jiangyan Yi, Zhengqi Wen, 
Gated Recurrent Fusion of Spatial and Spectral Features for Multi-Channel Speech Separation with Deep Embedding Representations.
Interspeech2020
Cunhang Fan, Jianhua Tao, Bin Liu 0041, Jiangyan Yi, Zhengqi Wen, 
Joint Training for Simultaneous Speech Denoising and Dereverberation with Deep Embedding Representations.
Interspeech2020
Jian Huang 0014, Jianhua Tao, Bin Liu 0041, Zheng Lian, 
Learning Utterance-Level Representations with Label Smoothing for Speech Emotion Recognition.
Interspeech2020
Yongwei Li, Jianhua Tao, Bin Liu 0041, Donna Erickson, Masato Akagi, 
Comparison of Glottal Source Parameter Values in Emotional Vowels.
Interspeech2020
Zheng Lian, Jianhua Tao, Bin Liu 0041, Jian Huang 0014, Zhanlei Yang, Rongjun Li, 
Context-Dependent Domain Adversarial Neural Network for Multimodal Emotion Recognition.
Interspeech2020
Zheng Lian, Jianhua Tao, Bin Liu 0041, Jian Huang 0014, Zhanlei Yang, Rongjun Li, 
Conversational Emotion Recognition Using Self-Attention Mechanisms and Graph Neural Networks.
Interspeech2020
Ziping Zhao 0001, Qifei Li, Nicholas Cummins, Bin Liu 0041, Haishuai Wang, Jianhua Tao, Björn W. Schuller, 
Hybrid Network Feature Extraction for Depression Assessment from Speech.
ICASSP2019
Bin Liu 0041, Shuai Nie, Yaping Zhang, Shan Liang, Zhanlei Yang, Wenju Liu, 
Loss and Double-edge-triggered Detector for Robust Small-footprint Keyword Spotting.
Interspeech2019
Cunhang Fan, Bin Liu 0041, Jianhua Tao, Jiangyan Yi, Zhengqi Wen, 
Discriminative Learning for Monaural Speech Separation Using Deep Embedding Features.
Interspeech2019
Zheng Lian, Jianhua Tao, Bin Liu 0041, Jian Huang 0014, 
Conversational Emotion Analysis via Attention Mechanisms.
Interspeech2019
Zheng Lian, Jianhua Tao, Bin Liu 0041, Jian Huang 0014, 
Unsupervised Representation Learning with Future Observation Prediction for Speech Emotion Recognition.
Interspeech2019
Bin Liu 0041, Shuai Nie, Shan Liang, Wenju Liu, Meng Yu 0003, Lianwu Chen, Shouye Peng, Changliang Li, 
Jointly Adversarial Enhancement Training for Robust End-to-End Speech Recognition.
Interspeech2019
Mingyue Niu, Jianhua Tao, Bin Liu 0041, Cunhang Fan, 
Automatic Depression Level Detection via ℓp-Norm Pooling.
ICASSP2022
Antoine Bruguier, Duc Le, Rohit Prabhavalkar, Dangna Li, Zhe Liu 0011, Bo Wang, Eun Chang, Fuchun Peng, Ozlem Kalinli, Michael L. Seltzer, 
Neural-FST Class Language Model for End-to-End Speech Recognition.
ICASSP2022
Tara N. Sainath, Yanzhang He, Arun Narayanan, Rami Botros, Weiran Wang, David Qiu, Chung-Cheng Chiu, Rohit Prabhavalkar, Alexander Gruenstein, Anmol Gulati, Bo Li 0028, David Rybach, Emmanuel Guzman, Ian McGraw, James Qin, Krzysztof Choromanski, Qiao Liang, Robert David, Ruoming Pang, Shuo-Yiin Chang, Trevor Strohman, W. Ronny Huang, Wei Han 0002, Yonghui Wu, Yu Zhang 0033, 
Improving The Latency And Quality Of Cascaded Encoders.
Interspeech2022
Shaojin Ding, Weiran Wang, Ding Zhao, Tara N. Sainath, Yanzhang He, Robert David, Rami Botros, Xin Wang 0016, Rina Panigrahy, Qiao Liang, Dongseong Hwang, Ian McGraw, Rohit Prabhavalkar, Trevor Strohman, 
A Unified Cascaded Encoder ASR Model for Dynamic Model Sizes.
Interspeech2022
Ke Hu, Tara N. Sainath, Yanzhang He, Rohit Prabhavalkar, Trevor Strohman, Sepand Mavandadi, Weiran Wang, 
Improving Deliberation by Text-Only and Semi-Supervised Training.
Interspeech2022
W. Ronny Huang, Shuo-Yiin Chang, David Rybach, Tara N. Sainath, Rohit Prabhavalkar, Cal Peyser, Zhiyun Lu, Cyril Allauzen, 
E2E Segmenter: Joint Segmenting and Decoding for Long-Form ASR.
Interspeech2022
Weiran Wang, Tongzhou Chen, Tara N. Sainath, Ehsan Variani, Rohit Prabhavalkar, W. Ronny Huang, Bhuvana Ramabhadran, Neeraj Gaur, Sepand Mavandadi, Cal Peyser, Trevor Strohman, Yanzhang He, David Rybach, 
Improving Rare Word Recognition with LM-aware MWER Training.
ICASSP2021
Nathan Howard, Alex Park 0001, Turaj Zakizadeh Shabestary, Alexander Gruenstein, Rohit Prabhavalkar, 
A Neural Acoustic Echo Canceller Optimized Using An Automatic Speech Recognizer and Large Scale Synthetic Data.
ICASSP2021
David Qiu, Qiujia Li, Yanzhang He, Yu Zhang 0033, Bo Li 0028, Liangliang Cao, Rohit Prabhavalkar, Deepti Bhatia, Wei Li 0133, Ke Hu, Tara N. Sainath, Ian McGraw, 
Learning Word-Level Confidence for Subword End-To-End ASR.
Interspeech2021
Yuan Shangguan, Rohit Prabhavalkar, Hang Su, Jay Mahadeokar, Yangyang Shi, Jiatong Zhou, Chunyang Wu, Duc Le, Ozlem Kalinli, Christian Fuegen, Michael L. Seltzer, 
Dissecting User-Perceived Latency of On-Device E2E Speech Recognition.
Interspeech2021
Yangyang Shi, Varun Nagaraja, Chunyang Wu, Jay Mahadeokar, Duc Le, Rohit Prabhavalkar, Alex Xiao, Ching-Feng Yeh, Julian Chan, Christian Fuegen, Ozlem Kalinli, Michael L. Seltzer, 
Dynamic Encoder Transducer: A Flexible Solution for Trading Off Accuracy for Latency.
ICASSP2020
Ke Hu, Tara N. Sainath, Ruoming Pang, Rohit Prabhavalkar, 
Deliberation Model Based Two-Pass End-To-End Speech Recognition.
ICASSP2020
Tara N. Sainath, Yanzhang He, Bo Li 0028, Arun Narayanan, Ruoming Pang, Antoine Bruguier, Shuo-Yiin Chang, Wei Li 0133, Raziel Alvarez, Zhifeng Chen, Chung-Cheng Chiu, David Garcia, Alexander Gruenstein, Ke Hu, Anjuli Kannan, Qiao Liang, Ian McGraw, Cal Peyser, Rohit Prabhavalkar, Golan Pundak, David Rybach, Yuan Shangguan, Yash Sheth, Trevor Strohman, Mirkó Visontai, Yonghui Wu, Yu Zhang 0033, Ding Zhao, 
A Streaming On-Device End-To-End Model Surpassing Server-Side Conventional Model Quality and Latency.
Interspeech2020
Antoine Bruguier, Ananya Misra, Arun Narayanan, Rohit Prabhavalkar, 
Anti-Aliasing Regularization in Stacking Layers.
ICASSP2019
Yanzhang He, Tara N. Sainath, Rohit Prabhavalkar, Ian McGraw, Raziel Alvarez, Ding Zhao, David Rybach, Anjuli Kannan, Yonghui Wu, Ruoming Pang, Qiao Liang, Deepti Bhatia, Yuan Shangguan, Bo Li 0028, Golan Pundak, Khe Chai Sim, Tom Bagby, Shuo-Yiin Chang, Kanishka Rao, Alexander Gruenstein, 
Streaming End-to-end Speech Recognition for Mobile Devices.
Interspeech2019
Ke Hu, Antoine Bruguier, Tara N. Sainath, Rohit Prabhavalkar, Golan Pundak, 
Phoneme-Based Contextualization for Cross-Lingual Speech Recognition in End-to-End Models.
Interspeech2019
Kazuki Irie, Rohit Prabhavalkar, Anjuli Kannan, Antoine Bruguier, David Rybach, Patrick Nguyen, 
On the Choice of Modeling Unit for Sequence-to-Sequence Speech Recognition.
Interspeech2019
Tara N. Sainath, Ruoming Pang, David Rybach, Yanzhang He, Rohit Prabhavalkar, Wei Li 0133, Mirkó Visontai, Qiao Liang, Trevor Strohman, Yonghui Wu, Ian McGraw, Chung-Cheng Chiu, 
Two-Pass End-to-End Speech Recognition.
ICASSP2018
Chung-Cheng Chiu, Tara N. Sainath, Yonghui Wu, Rohit Prabhavalkar, Patrick Nguyen, Zhifeng Chen, Anjuli Kannan, Ron J. Weiss, Kanishka Rao, Ekaterina Gonina, Navdeep Jaitly, Bo Li 0028, Jan Chorowski, Michiel Bacchiani, 
State-of-the-Art Speech Recognition with Sequence-to-Sequence Models.
ICASSP2018
Chris Donahue, Bo Li 0028, Rohit Prabhavalkar, 
Exploring Speech Enhancement with Generative Adversarial Networks for Robust Speech Recognition.
ICASSP2018
Tara N. Sainath, Rohit Prabhavalkar, Shankar Kumar, Seungji Lee, Anjuli Kannan, David Rybach, Vlad Schogol, Patrick Nguyen, Bo Li 0028, Yonghui Wu, Zhifeng Chen, Chung-Cheng Chiu, 
No Need for a Lexicon? Evaluating the Value of the Pronunciation Lexica in End-to-End Models.
ICASSP2022
Efthymios Tzinis, Yossi Adi, Vamsi K. Ithapu, Buye Xu, Anurag Kumar 0003, 
Continual Self-Training With Bootstrapped Remixing For Speech Enhancement.
Interspeech2022
Shahaf Bassan, Yossi Adi, Jeffrey S. Rosenschein, 
Unsupervised Symbolic Music Segmentation using Ensemble Temporal Prediction Errors.
Interspeech2022
Sravya Popuri, Peng-Jen Chen, Changhan Wang, Juan Pino 0001, Yossi Adi, Jiatao Gu, Wei-Ning Hsu, Ann Lee 0001, 
Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation.
Interspeech2022
Maureen de Seyssel, Marvin Lavechin, Yossi Adi, Emmanuel Dupoux, Guillaume Wisniewski, 
Probing phoneme, language and speaker information in unsupervised speech representations.
Interspeech2022
Or Tal, Moshe Mandel, Felix Kreuk, Yossi Adi, 
A Systematic Comparison of Phonetic Aware Techniques for Speech Enhancement.
Interspeech2022
Arnon Turetzky, Tzvi Michelson, Yossi Adi, Shmuel Peleg, 
Deep Audio Waveform Prior.
ACL2022
Eugene Kharitonov, Ann Lee 0001, Adam Polyak, Yossi Adi, Jade Copet, Kushal Lakhotia, Tu Anh Nguyen, Morgane Rivière, Abdelrahman Mohamed, Emmanuel Dupoux, Wei-Ning Hsu, 
Text-Free Prosody-Aware Generative Spoken Language Modeling.
ACL2022
Ann Lee 0001, Peng-Jen Chen, Changhan Wang, Jiatao Gu, Sravya Popuri, Xutai Ma, Adam Polyak, Yossi Adi, Qing He, Yun Tang 0002, Juan Pino 0001, Wei-Ning Hsu, 
Direct Speech-to-Speech Translation With Discrete Units.
NAACL2022
Ann Lee 0001, Hongyu Gong, Paul-Ambroise Duquenne, Holger Schwenk, Peng-Jen Chen, Changhan Wang, Sravya Popuri, Yossi Adi, Juan Miguel Pino, Jiatao Gu, Wei-Ning Hsu, 
Textless Speech-to-Speech Translation on Real Data.
ICASSP2021
Shlomo E. Chazan, Lior Wolf, Eliya Nachmani, Yossi Adi, 
Single Channel Voice Separation for Unknown Number of Speakers Under Reverberant and Noisy Settings.
ICASSP2021
Adam Polyak, Lior Wolf, Yossi Adi, Ori Kabeli, Yaniv Taigman, 
High Fidelity Speech Regeneration with Application to Speech Enhancement.
Interspeech2021
Adam Polyak, Yossi Adi, Jade Copet, Eugene Kharitonov, Kushal Lakhotia, Wei-Ning Hsu, Abdelrahman Mohamed, Emmanuel Dupoux, 
Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.
ICASSP2020
Felix Kreuk, Yaniv Sheena, Joseph Keshet, Yossi Adi, 
Phoneme Boundary Detection Using Learnable Segmental Features.
Interspeech2020
Alexandre Défossez, Gabriel Synnaeve, Yossi Adi, 
Real Time Speech Enhancement in the Waveform Domain.
Interspeech2020
Felix Kreuk, Yossi Adi, Bhiksha Raj, Rita Singh, Joseph Keshet, 
Hide and Speak: Towards Deep Neural Networks for Speech Steganography.
Interspeech2020
Felix Kreuk, Joseph Keshet, Yossi Adi, 
Self-Supervised Contrastive Learning for Unsupervised Phoneme Segmentation.
Interspeech2020
Adam Polyak, Lior Wolf, Yossi Adi, Yaniv Taigman, 
Unsupervised Cross-Domain Singing Voice Conversion.
ICML2020
Eliya Nachmani, Yossi Adi, Lior Wolf, 
Voice Separation with an Unknown Number of Multiple Speakers.
ICASSP2019
Yossi Adi, Neil Zeghidour, Ronan Collobert, Nicolas Usunier, Vitaliy Liptchinsky, Gabriel Synnaeve, 
To Reverse the Gradient or Not: an Empirical Comparison of Adversarial and Multi-task Learning in Speech Recognition.
ICASSP2018
Felix Kreuk, Yossi Adi, Moustapha Cissé, Joseph Keshet, 
Fooling End-To-End Speaker Verification With Adversarial Examples.
ICASSP2022
Jee-weon Jung, Hee-Soo Heo, Hemlata Tak, Hye-jin Shim, Joon Son Chung, Bong-Jin Lee, Ha-Jin Yu, Nicholas W. D. Evans, 
AASIST: Audio Anti-Spoofing Using Integrated Spectro-Temporal Graph Attention Networks.
ICASSP2022
Namkyu Jung, Geonmin Kim, Joon Son Chung, 
Spell My Name: Keyword Boosted Speech Recognition.
ICASSP2022
Youngki Kwon, Hee-Soo Heo, Jee-Weon Jung, You Jin Kim, Bong-Jin Lee, Joon Son Chung, 
Multi-Scale Speaker Embedding-Based Graph Attention Networks For Speaker Diarisation.
Interspeech2022
Jee-weon Jung, You Jin Kim, Hee-Soo Heo, Bong-Jin Lee, Youngki Kwon, Joon Son Chung, 
Pushing the limits of raw waveform speaker recognition.
ICASSP2021
Andrew Brown 0006, Jaesung Huh, Arsha Nagrani, Joon Son Chung, Andrew Zisserman, 
Playing a Part: Speaker Verification at the movies.
ICASSP2021
Jee-weon Jung, Hee-Soo Heo, Ha-Jin Yu, Joon Son Chung, 
Graph Attention Networks for Speaker Verification.
ICASSP2021
Yoohwan Kwon, Hee-Soo Heo, Bong-Jin Lee, Joon Son Chung, 
The ins and outs of speaker recognition: lessons from VoxSRC 2020.
Interspeech2021
Jee-weon Jung, Hee-Soo Heo, Youngki Kwon, Joon Son Chung, Bong-Jin Lee, 
Three-Class Overlapped Speech Detection Using a Convolutional Recurrent Neural Network.
Interspeech2021
You Jin Kim, Hee-Soo Heo, Soyeon Choe, Soo-Whan Chung, Yoohwan Kwon, Bong-Jin Lee, Youngki Kwon, Joon Son Chung, 
Look Who's Talking: Active Speaker Detection in the Wild.
Interspeech2021
Youngki Kwon, Jee-weon Jung, Hee-Soo Heo, You Jin Kim, Bong-Jin Lee, Joon Son Chung, 
Adapting Speaker Embeddings for Speaker Diarisation.
ICASSP2020
Triantafyllos Afouras, Joon Son Chung, Andrew Zisserman, 
ASR is All You Need: Cross-Modal Distillation for Lip Reading.
ICASSP2020
Seongkyu Mun, Soyeon Choe, Jaesung Huh, Joon Son Chung, 
The Sound of My Voice: Speaker Representation Loss for Target Voice Separation.
ICASSP2020
Arsha Nagrani, Joon Son Chung, Samuel Albanie, Andrew Zisserman, 
Disentangled Speech Embeddings Using Cross-Modal Self-Supervision.
Interspeech2020
Triantafyllos Afouras, Joon Son Chung, Andrew Zisserman, 
Now You're Speaking My Language: Visual Language Identification.
Interspeech2020
Soo-Whan Chung, Soyeon Choe, Joon Son Chung, Hong-Goo Kang, 
FaceFilter: Audio-Visual Speech Separation Using Still Images.
Interspeech2020
Joon Son Chung, Jaesung Huh, Seongkyu Mun, Minjae Lee, Hee-Soo Heo, Soyeon Choe, Chiheon Ham, Sunghwan Jung, Bong-Jin Lee, Icksang Han, 
In Defence of Metric Learning for Speaker Recognition.
Interspeech2020
Joon Son Chung, Jaesung Huh, Arsha Nagrani, Triantafyllos Afouras, Andrew Zisserman, 
Spot the Conversation: Speaker Diarisation in the Wild.
Interspeech2020
Soo-Whan Chung, Hong-Goo Kang, Joon Son Chung, 
Seeing Voices and Hearing Voices: Learning Discriminative Embeddings Using Cross-Modal Self-Supervision.
ICASSP2019
Weidi Xie, Arsha Nagrani, Joon Son Chung, Andrew Zisserman, 
Utterance-level Aggregation for Speaker Recognition in the Wild.
Interspeech2019
Triantafyllos Afouras, Joon Son Chung, Andrew Zisserman, 
My Lips Are Concealed: Audio-Visual Speech Enhancement Through Obstructions.
ICASSP2022
Antoine Bruguier, Duc Le, Rohit Prabhavalkar, Dangna Li, Zhe Liu 0011, Bo Wang, Eun Chang, Fuchun Peng, Ozlem Kalinli, Michael L. Seltzer, 
Neural-FST Class Language Model for End-to-End Speech Recognition.
Interspeech2022
Suyoun Kim, Duc Le, Weiyi Zheng, Tarun Singh, Abhinav Arora, Xiaoyu Zhai, Christian Fuegen, Ozlem Kalinli, Michael L. Seltzer, 
Evaluating User Perception of Speech Recognition System Quality with Semantic Distance Metric.
Interspeech2022
Duc Le, Akshat Shrivastava, Paden D. Tomasello, Suyoun Kim, Aleksandr Livshits, Ozlem Kalinli, Michael L. Seltzer, 
Deliberation Model for On-Device Spoken Language Understanding.
Interspeech2022
Jay Mahadeokar, Yangyang Shi, Ke Li, Duc Le, Jiedan Zhu, Vikas Chandra, Ozlem Kalinli, Michael L. Seltzer, 
Streaming parallel transducer beam search with fast slow cascaded encoders.
ICASSP2021
Suyoun Kim, Yuan Shangguan, Jay Mahadeokar, Antoine Bruguier, Christian Fuegen, Michael L. Seltzer, Duc Le, 
Improved Neural Language Model Fusion for Streaming Recurrent Neural Network Transducer.
Interspeech2021
Suyoun Kim, Abhinav Arora, Duc Le, Ching-Feng Yeh, Christian Fuegen, Ozlem Kalinli, Michael L. Seltzer, 
Semantic Distance: A New Metric for ASR Performance Analysis Towards Spoken Language Understanding.
Interspeech2021
Duc Le, Mahaveer Jain, Gil Keren, Suyoun Kim, Yangyang Shi, Jay Mahadeokar, Julian Chan, Yuan Shangguan, Christian Fuegen, Ozlem Kalinli, Yatharth Saraf, Michael L. Seltzer, 
Contextualized Streaming End-to-End Speech Recognition with Trie-Based Deep Biasing and Shallow Fusion.
Interspeech2021
Jay Mahadeokar, Yangyang Shi, Yuan Shangguan, Chunyang Wu, Alex Xiao, Hang Su, Duc Le, Ozlem Kalinli, Christian Fuegen, Michael L. Seltzer, 
Flexi-Transducer: Optimizing Latency, Accuracy and Compute for Multi-Domain On-Device Scenarios.
Interspeech2021
Varun Nagaraja, Yangyang Shi, Ganesh Venkatesh, Ozlem Kalinli, Michael L. Seltzer, Vikas Chandra, 
Collaborative Training of Acoustic Encoders for Speech Recognition.
Interspeech2021
Yuan Shangguan, Rohit Prabhavalkar, Hang Su, Jay Mahadeokar, Yangyang Shi, Jiatong Zhou, Chunyang Wu, Duc Le, Ozlem Kalinli, Christian Fuegen, Michael L. Seltzer, 
Dissecting User-Perceived Latency of On-Device E2E Speech Recognition.
Interspeech2021
Yangyang Shi, Varun Nagaraja, Chunyang Wu, Jay Mahadeokar, Duc Le, Rohit Prabhavalkar, Alex Xiao, Ching-Feng Yeh, Julian Chan, Christian Fuegen, Ozlem Kalinli, Michael L. Seltzer, 
Dynamic Encoder Transducer: A Flexible Solution for Trading Off Accuracy for Latency.
ICASSP2020
Yi-Chen Chen, Zhaojun Yang, Ching-Feng Yeh, Mahaveer Jain, Michael L. Seltzer, 
Aipnet: Generative Adversarial Pre-Training of Accent-Invariant Networks for End-To-End Speech Recognition.
ICASSP2020
Duc Le, Thilo Köhler, Christian Fuegen, Michael L. Seltzer, 
G2G: TTS-Driven Pronunciation Learning for Graphemic Hybrid ASR.
ICASSP2020
Yongqiang Wang 0005, Abdelrahman Mohamed, Duc Le, Chunxi Liu, Alex Xiao, Jay Mahadeokar, Hongzhao Huang, Andros Tjandra, Xiaohui Zhang, Frank Zhang 0001, Christian Fuegen, Geoffrey Zweig, Michael L. Seltzer, 
Transformer-Based Acoustic Modeling for Hybrid Speech Recognition.
Interspeech2020
Yangyang Shi, Yongqiang Wang 0005, Chunyang Wu, Christian Fuegen, Frank Zhang 0001, Duc Le, Ching-Feng Yeh, Michael L. Seltzer, 
Weak-Attention Suppression for Transformer Based Speech Recognition.
ICASSP2019
Zhehuai Chen, Mahaveer Jain, Yongqiang Wang 0005, Michael L. Seltzer, Christian Fuegen, 
End-to-end Contextual Speech Recognition Using Class Language Models and a Token Passing Decoder.
Interspeech2019
Zhehuai Chen, Mahaveer Jain, Yongqiang Wang 0005, Michael L. Seltzer, Christian Fuegen, 
Joint Grapheme and Phoneme Embeddings for Contextual End-to-End ASR.
ICASSP2018
Suyoun Kim, Michael L. Seltzer, 
Towards Language-Universal End-to-End Speech Recognition.
Interspeech2018
Suyoun Kim, Michael L. Seltzer, Jinyu Li 0001, Rui Zhao 0017, 
Improved Training for Online End-to-end Speech Recognition Systems.
TASLP2017
Wayne Xiong, Jasha Droppo, Xuedong Huang 0001, Frank Seide, Michael L. Seltzer, Andreas Stolcke, Dong Yu 0001, Geoffrey Zweig, 
Toward Human Parity in Conversational Speech Recognition.
TASLP2022
Weiwei Lin 0002, Man-Wai Mak, 
Mixture Representation Learning for Deep Speaker Embedding.
TASLP2022
Youzhi Tu, Man-Wai Mak, 
Aggregating Frame-Level Information in the Spectral Domain With Self-Attention for Speaker Embedding.
ICASSP2022
Weiwei Lin 0002, Man-Wai Mak, 
Robust Speaker Verification Using Population-Based Data Augmentation.
ICASSP2022
Lu Yi, Man-Wai Mak, 
Disentangled Speaker Embedding for Robust Speaker Verification.
Interspeech2022
Zhenke Gao, Man-Wai Mak, Weiwei Lin 0002, 
UNet-DenseNet for Robust Far-Field Speaker Verification.
Interspeech2022
Xiaoquan Ke, Man-Wai Mak, Helen M. Meng, 
Automatic Selection of Discriminative Features for Dementia Detection in Cantonese-Speaking People.
ICASSP2021
Jinchao Li, Jianwei Yu, Zi Ye, Simon Wong, Man-Wai Mak, Brian Mak, Xunying Liu, Helen Meng, 
A Comparative Study of Acoustic and Linguistic Features Classification for Alzheimer's Disease Detection.
ICASSP2021
Youzhi Tu, Man-Wai Mak, 
Short-Time Spectral Aggregation for Speaker Embedding.
Interspeech2021
Youzhi Tu, Man-Wai Mak, 
Mutual Information Enhanced Training for Speaker Embedding.
TASLP2020
Weiwei Lin 0002, Man-Wai Mak, Na Li 0012, Dan Su 0002, Dong Yu 0001, 
A Framework for Adapting DNN Speaker Embedding Across Languages.
TASLP2020
Youzhi Tu, Man-Wai Mak, Jen-Tzung Chien, 
Variational Domain Adversarial Learning With Mutual Information Maximization for Speaker Verification.
ICASSP2020
Weiwei Lin 0002, Man-Wai Mak, Na Li 0012, Dan Su 0002, Dong Yu 0001, 
Multi-Level Deep Neural Network Adaptation for Speaker Verification Using MMD and Consistency Regularization.
Interspeech2020
Wei-Wei Lin 0002, Man-Wai Mak, 
Wav2Spk: A Simple DNN Architecture for Learning Speaker Embeddings from Waveforms.
Interspeech2020
Weiwei Lin 0002, Man-Wai Mak, Jen-Tzung Chien, 
Strategies for End-to-End Text-Independent Speaker Verification.
Interspeech2020
Lu Yi, Man-Wai Mak, 
Adversarial Separation and Adaptation Network for Far-Field Speaker Verification.
ICASSP2019
Wei-Wei Lin 0002, Man-Wai Mak, Youzhi Tu, Jen-Tzung Chien, 
Semi-supervised Nuisance-attribute Networks for Domain Adaptation.
Interspeech2019
Youzhi Tu, Man-Wai Mak, Jen-Tzung Chien, 
Variational Domain Adversarial Learning for Speaker Verification.
TASLP2018
Wei-Wei Lin 0002, Man-Wai Mak, Jen-Tzung Chien, 
Multisource I-Vectors Domain Adaptation Using Maximum Mean Discrepancy Based Autoencoders.
ICASSP2018
Longxin Li, Man-Wai Mak, 
Unsupervised Domain Adaptation for Gender-Aware PLDA Mixture Models.
TASLP2017
Na Li, Man-Wai Mak, Jen-Tzung Chien, 
DNN-Driven Mixture of PLDA for Robust Speaker Verification.
TASLP2021
Dörte Fischer, Simon Doclo, 
Robust Constrained MFMVDR Filters for Single-Channel Speech Enhancement Based on Spherical Uncertainty Set.
TASLP2020
Naveen Kumar Desiraju, Simon Doclo, Markus Buck, Tobias Wolff, 
Online Estimation of Reverberation Parameters For Late Residual Echo Suppression.
Interspeech2020
Felicitas Bederna, Henning F. Schepker, Christian Rollwage, Simon Doclo, Arne Pusch, Jörg Bitzer, Jan Rennies, 
Adaptive Compressive Onset-Enhancement for Improved Speech Intelligibility in Noise and Reverberation.
TASLP2019
Thomas Dietzen, Ann Spriet, Wouter Tirry, Simon Doclo, Marc Moonen, Toon van Waterschoot, 
Comparative Analysis of Generalized Sidelobe Cancellation and Multi-Channel Linear Prediction for Speech Dereverberation and Noise Reduction.
ICASSP2019
Marvin Tammen, Simon Doclo, Ina Kodrasi, 
Joint Estimation of RETF Vector and Power Spectral Densities for Speech Enhancement Based on Alternating Least Squares.
TASLP2018
Sebastian Braun, Adam Kuklasinski, Ofer Schwartz, Oliver Thiergart, Emanuël A. P. Habets, Sharon Gannot, Simon Doclo, Jesper Jensen 0001, 
Evaluation and Comparison of Late Reverberation Power Spectral Density Estimators.
TASLP2018
Ina Kodrasi, Simon Doclo, 
Analysis of Eigenvalue Decomposition-Based Late Reverberation Power Spectral Density Estimation.
TASLP2018
Daniel Marquardt, Simon Doclo, 
Interaural Coherence Preservation for Binaural Noise Reduction Using Partial Noise Estimation and Spectral Postfiltering.
ICASSP2018
Ina Kodrasi, Simon Doclo, 
Joint Late Reverberation and Noise Power Spectral Density Estimation in a Spatially Homogeneous Noise Field.
ICASSP2018
Marvin Tammen, Ina Kodrasi, Simon Doclo, 
Complexity Reduction of Eigenvalue Decomposition-Based Diffuse Power Spectral Density Estimators Using the Power Method.
TASLP2017
Ina Kodrasi, Simon Doclo, 
Signal-Dependent Penalty Functions for Robust Acoustic Multi-Channel Equalization.
TASLP2017
Adam Kuklasinski, Simon Doclo, Søren Holdt Jensen, Jesper Rindom Jensen, 
Correction to "Maximum Likelihood PSD Estimation for Speech Enhancement in Reverberation and Noise".
ICASSP2017
Hamza A. Javed, Benjamin Cauchi, Simon Doclo, Patrick A. Naylor, Stefan Goetze, 
Measuring, modelling and predicting perceived reverberation.
ICASSP2017
Ina Kodrasi, Simon Doclo, 
Late reverberant power spectral density estimation based on an eigenvalue decomposition.
ICASSP2017
Linh Thi Thuc Tran, Henning F. Schepker, Simon Doclo, Hai Huyen Dam, Sven Nordholm, 
Proportionate NLMS for adaptive feedback control in hearing aids.
TASLP2016
Ina Kodrasi, Simon Doclo, 
Joint Dereverberation and Noise Reduction Based on Acoustic Multi-Channel Equalization.
TASLP2016
Adam Kuklasinski, Simon Doclo, Søren Holdt Jensen, Jesper Jensen 0001, 
Maximum Likelihood PSD Estimation for Speech Enhancement in Reverberation and Noise.
TASLP2016
Nasser Mohammadiha, Simon Doclo, 
Speech Dereverberation Using Non-Negative Convolutive Transfer Function and Spectro-Temporal Modeling.
TASLP2016
Eugen Rasumow, Martin Hansen, Steven van de Par, Dirk Puschel, Volker Mellert, Simon Doclo, Matthias Blau, 
Regularization Approaches for Synthesizing HRTF Directivity Patterns.
TASLP2016
Henning F. Schepker, Simon Doclo, 
A Semidefinite Programming Approach to Min-max Estimation of the Common Part of Acoustic Feedback Paths in Hearing Aids.
ICASSP2022
Zhehuai Chen, Yu Zhang 0033, Andrew Rosenberg, Bhuvana Ramabhadran, Pedro J. Moreno 0001, Gary Wang, 
Tts4pretrain 2.0: Advancing the use of Text and Speech in ASR Pretraining with Consistency and Contrastive Losses.
Interspeech2022
Murali Karthick Baskar, Andrew Rosenberg, Bhuvana Ramabhadran, Yu Zhang 0033, Nicolás Serrano, 
Reducing Domain mismatch in Self-supervised speech pre-training.
Interspeech2022
Fadi Biadsy, Youzheng Chen, Xia Zhang, Oleg Rybakov, Andrew Rosenberg, Pedro J. Moreno 0001, 
A Scalable Model Specialization Framework for Training and Inference using Submodels and its Application to Speech Model Personalization.
Interspeech2022
Zhehuai Chen, Yu Zhang 0033, Andrew Rosenberg, Bhuvana Ramabhadran, Pedro J. Moreno 0001, Ankur Bapna, Heiga Zen, 
MAESTRO: Matched Speech Text Representations through Modality Matching.
Interspeech2022
Cal Peyser, W. Ronny Huang, Andrew Rosenberg, Tara N. Sainath, Michael Picheny, Kyunghyun Cho, 
Towards Disentangled Speech Representations.
Interspeech2022
Gary Wang, Andrew Rosenberg, Bhuvana Ramabhadran, Fadi Biadsy, Jesse Emond, Yinghui Huang, Pedro J. Moreno 0001, 
Non-Parallel Voice Conversion for ASR Augmentation.
ICASSP2021
Rohan Doshi, Youzheng Chen, Liyang Jiang, Xia Zhang, Fadi Biadsy, Bhuvana Ramabhadran, Fang Chu, Andrew Rosenberg, Pedro J. Moreno 0001, 
Extending Parrotron: An End-to-End, Speech Conversion and Speech Recognition Model for Atypical Speech.
Interspeech2021
Zhehuai Chen, Andrew Rosenberg, Yu Zhang 0033, Heiga Zen, Mohammadreza Ghodsi, Yinghui Huang, Jesse Emond, Gary Wang, Bhuvana Ramabhadran, Pedro J. Moreno 0001, 
Semi-Supervision in ASR: Sequential MixMatch and Factorized TTS-Based Augmentation.
ICASSP2020
Gary Wang, Andrew Rosenberg, Zhehuai Chen, Yu Zhang 0033, Bhuvana Ramabhadran, Yonghui Wu, Pedro J. Moreno 0001, 
Improving Speech Recognition Using Consistent Predictions on Synthesized Speech.
Interspeech2020
Zhehuai Chen, Andrew Rosenberg, Yu Zhang 0033, Gary Wang, Bhuvana Ramabhadran, Pedro J. Moreno 0001, 
Improving Speech Recognition Using GAN-Based Speech Synthesis and Contrastive Unspoken Text Selection.
Interspeech2020
Gary Wang, Andrew Rosenberg, Zhehuai Chen, Yu Zhang 0033, Bhuvana Ramabhadran, Pedro J. Moreno 0001, 
SCADA: Stochastic, Consistent and Adversarial Data Augmentation to Improve ASR.
ICASSP2019
Min Ma, Bhuvana Ramabhadran, Jesse Emond, Andrew Rosenberg, Fadi Biadsy, 
Comparison of Data Augmentation and Adaptation Strategies for Code-switched Automatic Speech Recognition.
Interspeech2019
Yu Zhang 0033, Ron J. Weiss, Heiga Zen, Yonghui Wu, Zhifeng Chen, R. J. Skerry-Ryan, Ye Jia, Andrew Rosenberg, Bhuvana Ramabhadran, 
Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning.
ICASSP2018
Andrew Rosenberg, Raul Fernandez, Bhuvana Ramabhadran, 
Measuring the Effect of Linguistic Resources on Prosody Modeling for Speech Synthesis.
ICASSP2018
Xuesong Yang, Kartik Audhkhasi, Andrew Rosenberg, Samuel Thomas 0001, Bhuvana Ramabhadran, Mark Hasegawa-Johnson, 
Joint Modeling of Accents and Acoustics for Multi-Accent Speech Recognition.
Interspeech2018
Takashi Fukuda, Raul Fernandez, Andrew Rosenberg, Samuel Thomas 0001, Bhuvana Ramabhadran, Alexander Sorin, Gakuto Kurata, 
Data Augmentation Improves Recognition of Foreign Accented Speech.
ICASSP2017
Kartik Audhkhasi, Andrew Rosenberg, Abhinav Sethy, Bhuvana Ramabhadran, Brian Kingsbury, 
End-to-end ASR-free keyword search from speech.
ICASSP2017
Andrew Rosenberg, Kartik Audhkhasi, Abhinav Sethy, Bhuvana Ramabhadran, Michael Picheny, 
End-to-end speech recognition and keyword search on low-resource languages.
ICASSP2017
Ali Raza Syed, Andrew Rosenberg, Michael I. Mandel, 
Active learning for low-resource speech recognition: Impact of selection size and language modeling data.
Interspeech2017
Asaf Rendel, Raul Fernandez, Zvi Kons, Andrew Rosenberg, Ron Hoory, Bhuvana Ramabhadran, 
Weakly-Supervised Phrase Assignment from Text in a Speech-Synthesis System Using Noisy Labels.
ICASSP2022
Hang Chen, Hengshun Zhou, Jun Du, Chin-Hui Lee, Jingdong Chen, Shinji Watanabe 0001, Sabato Marco Siniscalchi, Odette Scharenborg, Diyuan Liu, Bao-Cai Yin, Jia Pan, Jianqing Gao, Cong Liu 0006, 
The First Multimodal Information Based Speech Processing (Misp) Challenge: Data, Tasks, Baselines And Results.
Interspeech2022
Hang Chen, Jun Du, Yusheng Dai, Chin-Hui Lee, Sabato Marco Siniscalchi, Shinji Watanabe 0001, Odette Scharenborg, Jingdong Chen, Baocai Yin, Jia Pan, 
Audio-Visual Speech Recognition in MISP2021 Challenge: Dataset Release and Deep Analysis.
Interspeech2022
Hengshun Zhou, Jun Du, Gongzhen Zou, Zhaoxu Nian, Chin-Hui Lee, Sabato Marco Siniscalchi, Shinji Watanabe 0001, Odette Scharenborg, Jingdong Chen, Shifu Xiong, Jianqing Gao, 
Audio-Visual Wake Word Spotting in MISP2021 Challenge: Dataset Release and Deep Analysis.
ICASSP2021
Abdolreza Sabzi Shahrebabaki, Negar Olfati, Ali Shariq Imran, Magne Hallstein Johnsen, Sabato Marco Siniscalchi, Torbjørn Svendsen, 
A Two-Stage Deep Modeling Approach to Articulatory Inversion.
ICASSP2021
Chao-Han Huck Yang, Jun Qi 0002, Samuel Yen-Chi Chen, Pin-Yu Chen, Sabato Marco Siniscalchi, Xiaoli Ma, Chin-Hui Lee, 
Decentralizing Feature Extraction with Quantum Convolutional Neural Network for Automatic Speech Recognition.
Interspeech2021
Abdolreza Sabzi Shahrebabaki, Sabato Marco Siniscalchi, Torbjørn Svendsen, 
Raw Speech-to-Articulatory Inversion by Temporal Filtering and Decimation.
Interspeech2021
Chao-Han Huck Yang, Sabato Marco Siniscalchi, Chin-Hui Lee, 
PATE-AAE: Incorporating Adversarial Autoencoder into Private Aggregation of Teacher Ensembles for Spoken Command Classification.
TASLP2020
Ivan Kukanov, Trung Ngo Trong, Ville Hautamäki, Sabato Marco Siniscalchi, Valerio Mario Salerno, Kong Aik Lee, 
Maximal Figure-of-Merit Framework to Detect Multi-Label Phonetic Features for Spoken Language Recognition.
ICASSP2020
Jun Qi 0002, Hu Hu, Yannan Wang, Chao-Han Huck Yang, Sabato Marco Siniscalchi, Chin-Hui Lee, 
Tensor-To-Vector Regression for Multi-Channel Speech Enhancement Based on Tensor-Train Network.
Interspeech2020
Hu Hu, Sabato Marco Siniscalchi, Yannan Wang, Xue Bai, Jun Du, Chin-Hui Lee, 
An Acoustic Segment Model Based Segment Unit Selection Approach to Acoustic Scene Classification with Partial Utterances.
Interspeech2020
Hu Hu, Sabato Marco Siniscalchi, Yannan Wang, Chin-Hui Lee, 
Relational Teacher Student Learning with Neural Label Embedding for Device Adaptation in Acoustic Scene Classification.
Interspeech2020
Jun Qi 0002, Hu Hu, Yannan Wang, Chao-Han Huck Yang, Sabato Marco Siniscalchi, Chin-Hui Lee, 
Exploring Deep Hybrid Tensor-to-Vector Network Architectures for Regression Based Speech Enhancement.
Interspeech2020
Abdolreza Sabzi Shahrebabaki, Negar Olfati, Sabato Marco Siniscalchi, Giampiero Salvi, Torbjørn Svendsen, 
Transfer Learning of Articulatory Information Through Phone Information.
Interspeech2020
Abdolreza Sabzi Shahrebabaki, Sabato Marco Siniscalchi, Giampiero Salvi, Torbjørn Svendsen, 
Sequence-to-Sequence Articulatory Inversion Through Time Convolution of Sub-Band Frequency Signals.
TASLP2019
Wei Li 0119, Nancy F. Chen, Sabato Marco Siniscalchi, Chin-Hui Lee, 
Improving Mispronunciation Detection of Mandarin Tones for Non-Native Learners With Soft-Target Tone Labels and BLSTM-Based Deep Tone Models.
TASLP2019
Jun Qi 0002, Jun Du, Sabato Marco Siniscalchi, Chin-Hui Lee, 
A Theory on Deep Neural Network Based Vector-to-Vector Regression With an Illustration of Its Expressive Power in Speech Enhancement.
Interspeech2019
Abdolreza Sabzi Shahrebabaki, Negar Olfati, Ali Shariq Imran, Sabato Marco Siniscalchi, Torbjørn Svendsen, 
A Phonetic-Level Analysis of Different Input Features for Articulatory Inversion.
ICASSP2018
Wei Li 0119, Nancy F. Chen, Sabato Marco Siniscalchi, Chin-Hui Lee, 
Improving Mandarin Tone Mispronunciation Detection for Non-Native Learners with Soft-Target Tone Labels and BLSTM-Based Deep Models.
TASLP2017
Zhen Huang 0001, Sabato Marco Siniscalchi, Chin-Hui Lee, 
Bayesian Unsupervised Batch and Online Speaker Adaptation of Activation Function Parameters in Deep Models for Automatic Speech Recognition.
ICASSP2017
Sicheng Wang, Kehuang Li, Zhen Huang 0001, Sabato Marco Siniscalchi, Chin-Hui Lee, 
A transfer learning and progressive stacking approach to reducing deep model sizes with an application to speech enhancement.
ICASSP2022
Anton Ratnarajah, Shi-Xiong Zhang, Meng Yu 0003, Zhenyu Tang 0001, Dinesh Manocha, Dong Yu 0001, 
Fast-Rir: Fast Neural Diffuse Room Impulse Response Generator.
ICASSP2022
Brian Yan, Chunlei Zhang, Meng Yu 0003, Shi-Xiong Zhang, Siddharth Dalmia, Dan Berrebbi, Chao Weng, Shinji Watanabe 0001, Dong Yu 0001, 
Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization.
Interspeech2022
Vinay Kothapally, Yong Xu 0004, Meng Yu 0003, Shi-Xiong Zhang, Dong Yu 0001, 
Joint Neural AEC and Beamforming with Double-Talk Detection.
TASLP2021
Daniel Michelsanti, Zheng-Hua Tan, Shi-Xiong Zhang, Yong Xu 0004, Meng Yu 0003, Dong Yu 0001, Jesper Jensen 0001, 
An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation.
TASLP2021
Jianwei Yu, Shi-Xiong Zhang, Bo Wu, Shansong Liu, Shoukang Hu, Mengzhe Geng, Xunying Liu, Helen Meng, Dong Yu 0001, 
Audio-Visual Multi-Channel Integration and Recognition of Overlapped Speech.
TASLP2021
Zhuohuang Zhang, Yong Xu 0004, Meng Yu 0003, Shi-Xiong Zhang, Lianwu Chen, Donald S. Williamson, Dong Yu 0001, 
Multi-Channel Multi-Frame ADL-MVDR for Target Speech Separation.
ICASSP2021
Aswin Shanmugam Subramanian, Chao Weng, Shinji Watanabe 0001, Meng Yu 0003, Yong Xu 0004, Shi-Xiong Zhang, Dong Yu 0001, 
Directional ASR: A New Paradigm for E2E Multi-Speaker Speech Recognition with Source Localization.
Interspeech2021
Saurabh Kataria, Shi-Xiong Zhang, Dong Yu 0001, 
Multi-Channel Speaker Verification for Single and Multi-Talker Speech.
Interspeech2021
Xiyun Li, Yong Xu 0004, Meng Yu 0003, Shi-Xiong Zhang, Jiaming Xu 0001, Bo Xu 0002, Dong Yu 0001, 
MIMO Self-Attentive RNN Beamformer for Multi-Speaker Speech Separation.
Interspeech2021
Helin Wang, Bo Wu, Lianwu Chen, Meng Yu 0003, Jianwei Yu, Yong Xu 0004, Shi-Xiong Zhang, Chao Weng, Dan Su 0002, Dong Yu 0001, 
TeCANet: Temporal-Contextual Attention Network for Environment-Aware Speech Dereverberation.
Interspeech2021
Yong Xu 0004, Zhuohuang Zhang, Meng Yu 0003, Shi-Xiong Zhang, Dong Yu 0001, 
Generalized Spatio-Temporal RNN Beamformer for Target Speech Separation.
Interspeech2021
Meng Yu 0003, Chunlei Zhang, Yong Xu 0004, Shi-Xiong Zhang, Dong Yu 0001, 
MetricNet: Towards Improved Modeling For Non-Intrusive Speech Quality Assessment.
ICASSP2020
Rongzhi Gu, Shi-Xiong Zhang, Lianwu Chen, Yong Xu 0004, Meng Yu 0003, Dan Su 0002, Yuexian Zou, Dong Yu 0001, 
Enhancing End-to-End Multi-Channel Speech Separation Via Spatial Feature Learning.
ICASSP2020
Aswin Shanmugam Subramanian, Chao Weng, Meng Yu 0003, Shi-Xiong Zhang, Yong Xu 0004, Shinji Watanabe 0001, Dong Yu 0001, 
Far-Field Location Guided Target Speech Extraction Using End-to-End Speech Recognition Objectives.
ICASSP2020
Jianwei Yu, Shi-Xiong Zhang, Jian Wu 0027, Shahram Ghorbani, Bo Wu, Shiyin Kang, Shansong Liu, Xunying Liu, Helen Meng, Dong Yu 0001, 
Audio-Visual Recognition of Overlapped Speech for the LRS2 Dataset.
Interspeech2020
Shansong Liu, Xurong Xie, Jianwei Yu, Shoukang Hu, Mengzhe Geng, Rongfeng Su, Shi-Xiong Zhang, Xunying Liu, Helen Meng, 
Exploiting Cross-Domain Visual Feature Generation for Disordered Speech Recognition.
Interspeech2020
Yong Xu 0004, Meng Yu 0003, Shi-Xiong Zhang, Lianwu Chen, Chao Weng, Jianming Liu, Dong Yu 0001, 
Neural Spatio-Temporal Beamformer for Target Speech Separation.
Interspeech2020
Jianwei Yu, Bo Wu, Rongzhi Gu, Shi-Xiong Zhang, Lianwu Chen, Yong Xu 0004, Meng Yu 0003, Dan Su 0002, Dong Yu 0001, Xunying Liu, Helen Meng, 
Audio-Visual Multi-Channel Recognition of Overlapped Speech.
ICASSP2019
Shi-Xiong Zhang, Yifan Gong 0001, Dong Yu 0001, 
Encrypted Speech Recognition Using Deep Polynomial Networks.
Interspeech2019
Fahimeh Bahmaninezhad, Jian Wu 0027, Rongzhi Gu, Shi-Xiong Zhang, Yong Xu 0004, Meng Yu 0003, Dong Yu 0001, 
A Comprehensive Study of Speech Separation: Spectrogram vs Waveform Separation.
ICASSP2022
Zehua Chen, Xu Tan 0003, Ke Wang, Shifeng Pan, Danilo P. Mandic, Lei He 0005, Sheng Zhao, 
Infergrad: Improving Diffusion Models for Vocoder by Considering Inference in Training.
ICASSP2022
Yuanhao Yi, Lei He 0005, Shifeng Pan, Xi Wang 0016, Yujia Xiao, 
Prosodyspeech: Towards Advanced Prosody Model for Neural Text-to-Speech.
ICASSP2022
Fengpeng Yue, Yan Deng, Lei He 0005, Tom Ko, Yu Zhang 0006, 
Exploring Machine Speech Chain For Domain Adaptation.
Interspeech2022
Mutian He 0001, Jingzhou Yang, Lei He 0005, Frank K. Soong, 
Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge.
Interspeech2022
Yanqing Liu, Ruiqing Xue, Lei He 0005, Xu Tan 0003, Sheng Zhao, 
DelightfulTTS 2: End-to-End Speech Synthesis with Adversarial Vector-Quantized Auto-Encoders.
Interspeech2022
Yihan Wu, Xu Tan 0003, Bohan Li 0003, Lei He 0005, Sheng Zhao, Ruihua Song, Tao Qin, Tie-Yan Liu, 
AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios.
Interspeech2022
Yihan Wu, Xi Wang 0016, Shaofei Zhang, Lei He 0005, Ruihua Song, Jian-Yun Nie, 
Self-supervised Context-aware Style Representation for Expressive Speech Synthesis.
Interspeech2022
Yuanhao Yi, Lei He 0005, Shifeng Pan, Xi Wang 0016, Yuchao Zhang, 
SoftSpeech: Unsupervised Duration Model in FastSpeech 2.
Interspeech2021
Yan Deng, Rui Zhao 0017, Zhong Meng, Xie Chen 0001, Bing Liu, Jinyu Li 0001, Yifan Gong 0001, Lei He 0005, 
Improving RNN-T for Domain Scaling Using Semi-Supervised Training with Neural TTS.
Interspeech2021
Shifeng Pan, Lei He 0005, 
Cross-Speaker Style Transfer with Prosody Bottleneck in Neural Speech Synthesis.
ICASSP2020
Yan Huang 0028, Lei He 0005, Wenning Wei, William Gale, Jinyu Li 0001, Yifan Gong 0001, 
Using Personalized Speech Synthesis and Neural Language Generator for Rapid Speaker Adaptation.
ICASSP2020
Eva Sharma, Guoli Ye, Wenning Wei, Rui Zhao 0017, Yao Tian, Jian Wu 0027, Lei He 0005, Ed Lin, Yifan Gong 0001, 
Adaptation of RNN Transducer with Text-To-Speech Technology for Keyword Spotting.
ICASSP2020
Yujia Xiao, Lei He 0005, Huaiping Ming, Frank K. Soong, 
Improving Prosody with Linguistic and Bert Derived Features in Multi-Speaker Based Mandarin Chinese Neural TTS.
Interspeech2020
Yan Huang 0028, Jinyu Li 0001, Lei He 0005, Wenning Wei, William Gale, Yifan Gong 0001, 
Rapid RNN-T Adaptation Using Personalized Speech Synthesis and Neural Language Generator.
Interspeech2020
Yang Cui, Xi Wang 0016, Lei He 0005, Frank K. Soong, 
An Efficient Subband Linear Prediction for LPCNet-Based Neural Synthesis.
Interspeech2020
Jinyu Li 0001, Rui Zhao 0017, Zhong Meng, Yanqing Liu, Wenning Wei, Sarangarajan Parthasarathy, Vadim Mazalov, Zhenghao Wang, Lei He 0005, Sheng Zhao, Yifan Gong 0001, 
Developing RNN-T Models Surpassing High-Performance Hybrid Models with Customization Capability.
ICASSP2019
Yajie Zhang, Shifeng Pan, Lei He 0005, Zhen-Hua Ling, 
Learning Latent Representations for Style Control and Transfer in End-to-end Speech Synthesis.
Interspeech2019
Haohan Guo, Frank K. Soong, Lei He 0005, Lei Xie 0001, 
A New GAN-Based End-to-End TTS Training Algorithm.
Interspeech2019
Haohan Guo, Frank K. Soong, Lei He 0005, Lei Xie 0001, 
Exploiting Syntactic Features in a Parsed Tree to Improve End-to-End TTS.
Interspeech2019
Mutian He 0001, Yan Deng, Lei He 0005, 
Robust Sequence-to-Sequence Acoustic Modeling with Stepwise Monotonic Attention for Neural TTS.
ICASSP2022
Siddhant Arora, Siddharth Dalmia, Pavel Denisov, Xuankai Chang, Yushi Ueda, Yifan Peng, Yuekai Zhang, Sujay Kumar, Karthik Ganesan 0003, Brian Yan, Ngoc Thang Vu, Alan W. Black, Shinji Watanabe 0001, 
ESPnet-SLU: Advancing Spoken Language Understanding Through ESPnet.
ICASSP2022
Xuankai Chang, Niko Moritz, Takaaki Hori, Shinji Watanabe 0001, Jonathan Le Roux, 
Extended Graph Temporal Classification for Multi-Speaker End-to-End ASR.
ICASSP2022
Takashi Maekaku, Xuankai Chang, Yuya Fujita, Shinji Watanabe 0001, 
An Exploration of Hubert with Large Number of Cluster Units and Model Assessment Using Bayesian Information Criterion.
Interspeech2022
Siddhant Arora, Siddharth Dalmia, Xuankai Chang, Brian Yan, Alan W. Black, Shinji Watanabe 0001, 
Two-Pass Low Latency End-to-End Spoken Language Understanding.
Interspeech2022
Xuankai Chang, Takashi Maekaku, Yuya Fujita, Shinji Watanabe 0001, 
End-to-End Integration of Speech Recognition, Speech Enhancement, and Self-Supervised Learning Representation.
Interspeech2022
Yen-Ju Lu, Xuankai Chang, Chenda Li, Wangyou Zhang, Samuele Cornell, Zhaoheng Ni, Yoshiki Masuyama, Brian Yan, Robin Scheibler, Zhong-Qiu Wang, Yu Tsao 0001, Yanmin Qian, Shinji Watanabe 0001, 
ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding.
Interspeech2022
Jiatong Shi, Shuai Guo, Tao Qian, Tomoki Hayashi, Yuning Wu, Fangzheng Xu, Xuankai Chang, Huazhe Li, Peter Wu, Shinji Watanabe 0001, Qin Jin, 
Muskits: an End-to-end Music Processing Toolkit for Singing Voice Synthesis.
ACL2022
Hsiang-Sheng Tsai, Heng-Jui Chang, Wen-Chin Huang, Zili Huang, Kushal Lakhotia, Shu-Wen Yang, Shuyan Dong, Andy T. Liu, Cheng-I Lai, Jiatong Shi, Xuankai Chang, Phil Hall, Hsuan-Jui Chen, Shang-Wen Li 0001, Shinji Watanabe 0001, Abdelrahman Mohamed, Hung-yi Lee, 
SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities.
ICASSP2021
Pengcheng Guo, Florian Boyer, Xuankai Chang, Tomoki Hayashi, Yosuke Higuchi, Hirofumi Inaguma, Naoyuki Kamo, Chenda Li, Daniel Garcia-Romero, Jiatong Shi, Jing Shi 0003, Shinji Watanabe 0001, Kun Wei, Wangyou Zhang, Yuekai Zhang, 
Recent Developments on Espnet Toolkit Boosted By Conformer.
Interspeech2021
Pengcheng Guo, Xuankai Chang, Shinji Watanabe 0001, Lei Xie 0001, 
Multi-Speaker ASR Combining Non-Autoregressive Conformer CTC and Conditional Speaker Chain.
Interspeech2021
Takashi Maekaku, Xuankai Chang, Yuya Fujita, Li-Wei Chen, Shinji Watanabe 0001, Alexander I. Rudnicky, 
Speech Representation Learning Combining Conformer CPC with Deep Cluster for the ZeroSpeech Challenge 2021.
Interspeech2021
Tianzi Wang, Yuya Fujita, Xuankai Chang, Shinji Watanabe 0001, 
Streaming End-to-End ASR Based on Blockwise Non-Autoregressive Models.
Interspeech2021
Shu-Wen Yang, Po-Han Chi, Yung-Sung Chuang, Cheng-I Jeff Lai, Kushal Lakhotia, Yist Y. Lin, Andy T. Liu, Jiatong Shi, Xuankai Chang, Guan-Ting Lin, Tzu-Hsien Huang, Wei-Cheng Tseng, Ko-tik Lee, Da-Rong Liu, Zili Huang, Shuyan Dong, Shang-Wen Li 0001, Shinji Watanabe 0001, Abdelrahman Mohamed, Hung-yi Lee, 
SUPERB: Speech Processing Universal PERformance Benchmark.
TASLP2020
Wangyou Zhang, Xuankai Chang, Yanmin Qian, Shinji Watanabe 0001, 
Improving End-to-End Single-Channel Multi-Talker Speech Recognition.
ICASSP2020
Xuankai Chang, Wangyou Zhang, Yanmin Qian, Jonathan Le Roux, Shinji Watanabe 0001, 
End-To-End Multi-Speaker Speech Recognition With Transformer.
Interspeech2020
Xuankai Chang, Aswin Shanmugam Subramanian, Pengcheng Guo, Shinji Watanabe 0001, Yuya Fujita, Motoi Omachi, 
End-to-End ASR with Adaptive Span Self-Attention.
Interspeech2020
Yuya Fujita, Shinji Watanabe 0001, Motoi Omachi, Xuankai Chang, 
Insertion-Based Modeling for End-to-End Automatic Speech Recognition.
Interspeech2020
Wangyou Zhang, Aswin Shanmugam Subramanian, Xuankai Chang, Shinji Watanabe 0001, Yanmin Qian, 
End-to-End Far-Field Speech Recognition with Unified Dereverberation and Beamforming.
Interspeech2019
Wangyou Zhang, Xuankai Chang, Yanmin Qian, 
Knowledge Distillation for End-to-End Monaural Multi-Talker ASR System.
SpeechComm2018
Yanmin Qian, Xuankai Chang, Dong Yu 0001, 
Single-channel multi-talker speech recognition with permutation invariant training.
SpeechComm2023
Qiujia Li, Chao Zhang 0031, Philip C. Woodland, 
Combining hybrid DNN-HMM ASR systems with attention-based models using lattice rescoring.
Interspeech2022
Guangzhi Sun, Chao Zhang 0031, Philip C. Woodland, 
Tree-constrained Pointer Generator with Graph Neural Network Encodings for Contextual Speech Recognition.
Interspeech2022
Xianrui Zheng, Chao Zhang 0031, Philip C. Woodland, 
Tandem Multitask Training of Speaker Diarisation and Speech Recognition for Meeting Transcription.
ICASSP2021
Guangzhi Sun, Chao Zhang 0031, Philip C. Woodland, 
Transformer Language Models with LSTM-Based Cross-Utterance Information Representation.
ICASSP2021
Guangzhi Sun, D. Liu, Chao Zhang 0031, Philip C. Woodland, 
Content-Aware Speaker Embeddings for Speaker Diarisation.
ICASSP2021
Wen Wu, Chao Zhang 0031, Philip C. Woodland, 
Emotion Recognition by Fusing Time Synchronous and Time Asynchronous Representations.
ICASSP2021
Wei Xue, Gang Quan, Chao Zhang 0031, Guohong Ding, Xiaodong He 0001, Bowen Zhou, 
Neural Kalman Filtering for Speech Enhancement.
Interspeech2021
Dongcheng Jiang, Chao Zhang 0031, Philip C. Woodland, 
Variable Frame Rate Acoustic Models Using Minimum Error Reinforcement Learning.
Interspeech2020
Wei Song, Guanghui Xu, Zhengchen Zhang, Chao Zhang 0031, Xiaodong He 0001, Bowen Zhou, 
Efficient WaveGlow: An Improved WaveGlow Vocoder with Enhanced Speed.
Interspeech2020
Ying Tong, Wei Xue, Shanluo Huang, Lu Fan, Chao Zhang 0031, Guohong Ding, Xiaodong He 0001, 
The JD AI Speaker Verification System for the FFSVC 2020 Challenge.
Interspeech2020
Wei Xue, Ying Tong, Chao Zhang 0031, Guohong Ding, Xiaodong He 0001, Bowen Zhou, 
Sound Event Localization and Detection Based on Multiple DOA Beamforming and Multi-Task Learning.
ICASSP2019
Guangzhi Sun, Chao Zhang 0031, Philip C. Woodland, 
Speaker Diarisation Using 2D Self-attentive Combination of Embeddings.
ICASSP2019
Chao Zhang 0031, Florian L. Kreyssig, Qiujia Li, Philip C. Woodland, 
PyHTK: Python Library and ASR Pipelines for HTK.
Interspeech2019
Patrick von Platen, Chao Zhang 0031, Philip C. Woodland, 
Multi-Span Acoustic Modelling Using Raw Waveform Signals.
Interspeech2019
Wei Xue, Ying Tong, Guohong Ding, Chao Zhang 0031, Tao Ma, Xiaodong He 0001, Bowen Zhou, 
Direct-Path Signal Cross-Correlation Estimation for Sound Source Localization in Reverberation.
ICASSP2018
Florian L. Kreyssig, Chao Zhang 0031, Philip C. Woodland, 
Improved Tdnns Using Deep Kernels and Frequency Dependent Grid-RNNS.
ICASSP2018
Chao Zhang 0031, Philip C. Woodland, 
High Order Recurrent Neural Networks for Acoustic Modelling.
Interspeech2018
Yu Wang 0027, Chao Zhang 0031, Mark J. F. Gales, Philip C. Woodland, 
Speaker Adaptation and Adaptive Training for Jointly Optimised Tandem Systems.
Interspeech2018
Chao Zhang 0031, Philip C. Woodland, 
Semi-tied Units for Efficient Gating in LSTM and Highway Networks.
ICASSP2017
Chao Zhang 0031, Philip C. Woodland, 
Joint optimisation of tandem systems using Gaussian mixture density neural network discriminative sequence training.
ICASSP2022
Huajian Fang, Tal Peer, Stefan Wermter, Timo Gerkmann, 
Integrating Statistical Uncertainty into Neural Network-Based Speech Enhancement.
ICASSP2022
Jean-Marie Lemercier, Joachim Thiemann, Raphael Koning, Timo Gerkmann, 
Customizable End-To-End Optimization Of Online Neural Network-Supported Dereverberation For Hearing Devices.
Interspeech2022
Jean-Marie Lemercier, Joachim Thiemann, Raphael Koning, Timo Gerkmann, 
Neural Network-augmented Kalman Filtering for Robust Online Speech Dereverberation in Noisy Reverberant Environments.
Interspeech2022
Danilo de Oliveira, Tal Peer, Timo Gerkmann, 
Efficient Transformer-based Speech Enhancement Using Long Frames and STFT Magnitudes.
Interspeech2022
Navin Raj Prabhu, Guillaume Carbajal, Nale Lehmann-Willenbrock, Timo Gerkmann, 
End-To-End Label Uncertainty Modeling for Speech-based Arousal Recognition Using Bayesian Neural Networks.
Interspeech2022
Kristina Tesch, Nils-Hendrik Mohrmann, Timo Gerkmann, 
On the Role of Spatial, Spectral, and Temporal Processing for DNN-based Non-linear Multi-channel Speech Enhancement.
Interspeech2022
Simon Welker, Julius Richter, Timo Gerkmann, 
Speech Enhancement with Score-Based Generative Models in the Complex STFT Domain.
TASLP2021
Kristina Tesch, Timo Gerkmann, 
Nonlinear Spatial Filtering in Multichannel Speech Enhancement.
ICASSP2021
Guillaume Carbajal, Julius Richter, Timo Gerkmann, 
Guided Variational Autoencoder for Speech Enhancement with a Supervised Classifier.
NeurIPS2021
Xiaolin Hu 0001, Kai Li, Weiyi Zhang, Yi Luo, Jean-Marie Lemercier, Timo Gerkmann, 
Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network.
ICASSP2020
David Ditter, Timo Gerkmann, 
A Multi-Phase Gammatone Filterbank for Speech Separation Via Tasnet.
ICASSP2020
Kristina Tesch, Timo Gerkmann, 
Nonlinear Spatial Filtering for Multichannel Speech Enhancement in Inhomogeneous Noise Fields.
Interspeech2020
Julius Richter, Guillaume Carbajal, Timo Gerkmann, 
Speech Enhancement with Stochastic Temporal Convolutional Networks.
ICASSP2019
Robert Rehr, Timo Gerkmann, 
An Analysis of Noise-aware Features in Combination with the Size and Diversity of Training Data for DNN-based Speech Enhancement.
Interspeech2019
David Ditter, Timo Gerkmann, 
Influence of Speaker-Specific Parameters on Speech Separation Systems.
Interspeech2019
Kristina Tesch, Robert Rehr, Timo Gerkmann, 
On Nonlinear Spatial Filtering in Multichannel Speech Enhancement.
TASLP2018
Martin Krawczyk-Becker, Timo Gerkmann, 
On Speech Enhancement Under PSD Uncertainty.
TASLP2018
Robert Rehr, Timo Gerkmann, 
On the Importance of Super-Gaussian Speech Priors for Machine-Learning Based Speech Enhancement.
ICASSP2018
Martin Krawczyk-Becker, Timo Gerkmann, 
Nonlinear Speech Enhancement Under Speech PSD Uncertainty.
TASLP2017
Robert Rehr, Timo Gerkmann, 
An Analysis of Adaptive Recursive Smoothing with Applications to Noise PSD Estimation.
TASLP2022
Gaku Kotani, Daisuke Saito, Nobuaki Minematsu, 
Voice Conversion Based on Deep Neural Networks for Time-Variant Linear Transformations.
TASLP2022
Hitoshi Suda, Daisuke Saito, Satoru Fukayama, Tomoyasu Nakano, Masataka Goto, 
Singer Diarization for Polyphonic Music With Unison Singing.
ICASSP2022
Eisuke Konno, Daisuke Saito, Nobuaki Minematsu, 
Quantifying Discriminability between NMF Bases.
Interspeech2022
Takeru Gorai, Daisuke Saito, Nobuaki Minematsu, 
Text-to-speech synthesis using spectral modeling based on non-negative autoencoder.
Interspeech2022
Takuya Kunihara, Chuanbo Zhu 0001, Daisuke Saito, Nobuaki Minematsu, Noriko Nakanishi, 
Detection of Learners' Listening Breakdown with Oral Dictation and Its Use to Model Listening Skill Improvement Exclusively Through Shadowing.
Interspeech2021
Shintaro Ando, Nobuaki Minematsu, Daisuke Saito, 
Lexical Density Analysis of Word Productions in Japanese English Using Acoustic Word Embeddings.
Interspeech2020
Tatsuma Ishihara, Daisuke Saito, 
Attention-Based Speaker Embeddings for One-Shot Voice Conversion.
Interspeech2020
Zhenchao Lin, Ryo Takashima, Daisuke Saito, Nobuaki Minematsu, Noriko Nakanishi, 
Shadowability Annotation with Fine Granularity on L2 Utterances and its Improvement with Native Listeners' Script-Shadowing.
Interspeech2020
Yuma Shirahata, Daisuke Saito, Nobuaki Minematsu, 
Discriminative Method to Extract Coarse Prosodic Structure and its Application for Statistical Phrase/Accent Command Estimation.
Interspeech2020
Hitoshi Suda, Gaku Kotani, Daisuke Saito, 
Nonparallel Training of Exemplar-Based Voice Conversion System Using INCA-Based Alignment Technique.
TASLP2019
Tetsuya Hashimoto, Daisuke Saito, Nobuaki Minematsu, 
Many-to-Many and Completely Parallel-Data-Free Voice Conversion Based on Eigenspace DNN.
Interspeech2019
Tasavat Trisitichoke, Shintaro Ando, Daisuke Saito, Nobuaki Minematsu, 
Analysis of Native Listeners' Facial Microexpressions While Shadowing Non-Native Speech - Potential of Shadowers' Facial Expressions for Comprehensibility Prediction.
Interspeech2018
Yusuke Inoue 0004, Suguru Kabashima, Daisuke Saito, Nobuaki Minematsu, Kumi Kanamura, Yutaka Yamauchi, 
A Study of Objective Measurement of Comprehensibility through Native Speakers' Shadowing of Learners' Utterances.
Interspeech2018
Yasuhito Ohsugi, Daisuke Saito, Nobuaki Minematsu, 
A Comparative Study of Statistical Conversion of Face to Voice Based on Their Subjective Impressions.
Interspeech2017
Tetsuya Hashimoto, Hidetsugu Uchida, Daisuke Saito, Nobuaki Minematsu, 
Parallel-Data-Free Many-to-Many Voice Conversion Based on DNN Integrated with Eigenspace Using a Non-Parallel Speech Corpus.
Interspeech2017
Shohei Toyama, Daisuke Saito, Nobuaki Minematsu, 
Use of Global and Acoustic Features Associated with Contextual Factors to Adapt Language Models for Spontaneous Speech Recognition.
Interspeech2017
Hidetsugu Uchida, Daisuke Saito, Nobuaki Minematsu, 
Acoustic-to-Articulatory Mapping Based on Mixture of Probabilistic Canonical Correlation Analysis.
Interspeech2017
Junwei Yue, Fumiya Shiozawa, Shohei Toyama, Yutaka Yamauchi, Kayoko Ito, Daisuke Saito, Nobuaki Minematsu, 
Automatic Scoring of Shadowing Speech Based on DNN Posteriors and Their DTW.
TASLP2016
Zhizheng Wu 0001, Phillip L. De Leon, Cenk Demiroglu, Ali Khodabakhsh 0001, Simon King, Zhen-Hua Ling, Daisuke Saito, Bryan Stewart, Tomoki Toda, Mirjam Wester, Junichi Yamagishi, 
Anti-Spoofing for Text-Independent Speaker Verification: An Initial Database, Comparison of Countermeasures, and Human Performance.
Interspeech2016
Shuju Shi, Yosuke Kashiwagi, Shohei Toyama, Junwei Yue, Yutaka Yamauchi, Daisuke Saito, Nobuaki Minematsu, 
Automatic Assessment and Error Detection of Shadowing Speech: Case of English Spoken by Japanese Learners.
ICASSP2022
Ladislav Mosner, Oldrich Plchot, Lukás Burget, Jan Honza Cernocký, 
Multisv: Dataset for Far-Field Multi-Channel Speaker Verification.
Interspeech2022
Niko Brummer, Albert Swart, Ladislav Mosner, Anna Silnova, Oldrich Plchot, Themos Stafylakis, Lukás Burget, 
Probabilistic Spherical Discriminant Analysis: An Alternative to PLDA for length-normalized embeddings.
Interspeech2022
Junyi Peng, Rongzhi Gu, Ladislav Mosner, Oldrich Plchot, Lukás Burget, Jan Cernocký, 
Learnable Sparse Filterbank for Speaker Verification.
Interspeech2022
Themos Stafylakis, Ladislav Mosner, Oldrich Plchot, Johan Rohdin, Anna Silnova, Lukás Burget, Jan Cernocký, 
Training speaker embedding extractors using multi-speaker audio with unknown speaker boundaries.
TASLP2020
Santosh Kesiraju, Oldrich Plchot, Lukás Burget, Suryakanth V. Gangashetty, 
Learning Document Embeddings Along With Their Uncertainties.
ICASSP2020
Shuai Wang 0016, Johan Rohdin, Oldrich Plchot, Lukás Burget, Kai Yu 0004, Jan Cernocký, 
Investigation of Specaugment for Deep Speaker Embedding Learning.
ICASSP2020
Federico Landini, Shuai Wang 0016, Mireia Díez, Lukás Burget, Pavel Matejka, Katerina Zmolíková, Ladislav Mosner, Anna Silnova, Oldrich Plchot, Ondrej Novotný, Hossein Zeinali, Johan Rohdin, 
But System for the Second Dihard Speech Diarization Challenge.
Interspeech2020
Alicia Lozano-Diez, Anna Silnova, Bhargav Pulugundla, Johan Rohdin, Karel Veselý, Lukás Burget, Oldrich Plchot, Ondrej Glembek, Ondrej Novotný, Pavel Matejka, 
BUT Text-Dependent Speaker Verification System for SdSV Challenge 2020.
ICASSP2019
Ondrej Novotný, Oldrich Plchot, Ondrej Glembek, Lukás Burget, Pavel Matejka, 
Discriminatively Re-trained I-vector Extractor for Speaker Recognition.
ICASSP2019
Johan Rohdin, Themos Stafylakis, Anna Silnova, Hossein Zeinali, Lukás Burget, Oldrich Plchot, 
Speaker Verification Using End-to-end Adversarial Language Adaptation.
Interspeech2019
Pavel Matejka, Oldrich Plchot, Hossein Zeinali, Ladislav Mosner, Anna Silnova, Lukás Burget, Ondrej Novotný, Ondrej Glembek, 
Analysis of BUT Submission in Far-Field Scenarios of VOiCES 2019 Challenge.
Interspeech2019
Pavel Matejka, Oldrich Plchot, Hossein Zeinali, Ladislav Mosner, Anna Silnova, Lukás Burget, Ondrej Novotný, Ondrej Glembek, 
Analysis of BUT Submission in Far-Field Scenarios of VOiCES 2019 Challenge.
Interspeech2019
Ondrej Novotný, Oldrich Plchot, Ondrej Glembek, Lukás Burget, 
Factorization of Discriminatively Trained i-Vector Extractor for Speaker Recognition.
Interspeech2019
Themos Stafylakis, Johan Rohdin, Oldrich Plchot, Petr Mizera, Lukás Burget, 
Self-Supervised Speaker Embeddings.
Interspeech2019
Shuai Wang 0016, Johan Rohdin, Lukás Burget, Oldrich Plchot, Yanmin Qian, Kai Yu 0004, Jan Cernocký, 
On the Usage of Phonetic Information for Text-Independent Speaker Embedding Extraction.
ICASSP2018
Alicia Lozano-Diez, Oldrich Plchot, Pavel Matejka, Joaquin Gonzalez-Rodriguez, 
DNN Based Embeddings for Language Recognition.
Interspeech2018
Mireia Díez, Federico Landini, Lukás Burget, Johan Rohdin, Anna Silnova, Katerina Zmolíková, Ondrej Novotný, Karel Veselý, Ondrej Glembek, Oldrich Plchot, Ladislav Mosner, Pavel Matejka, 
BUT System for DIHARD Speech Diarization Challenge 2018.
Interspeech2018
Ladislav Mosner, Oldrich Plchot, Pavel Matejka, Ondrej Novotný, Jan Cernocký, 
Dereverberation and Beamforming in Robust Far-Field Speaker Recognition.
Interspeech2017
Pavel Matejka, Ondrej Novotný, Oldrich Plchot, Lukás Burget, Mireia Díez Sánchez, Jan Cernocký, 
Analysis of Score Normalization in Multilingual Speaker Recognition.
Interspeech2017
Oldrich Plchot, Pavel Matejka, Anna Silnova, Ondrej Novotný, Mireia Díez Sánchez, Johan Rohdin, Ondrej Glembek, Niko Brümmer, Albert Swart, Jesús Jorrín-Prieto, Paola García, Luis Buera, Patrick Kenny, Md. Jahangir Alam, Gautam Bhattacharya, 
Analysis and Description of ABC Submission to NIST SRE 2016.
ICASSP2022
Liang Lu 0001, Jinyu Li 0001, Yifan Gong 0001, 
Endpoint Detection for Streaming End-to-End Multi-Talker ASR.
ICASSP2022
Desh Raj, Liang Lu 0001, Zhuo Chen 0006, Yashesh Gaur, Jinyu Li 0001, 
Continuous Streaming Multi-Talker ASR with Dual-Path Transducers.
ICASSP2021
Naoyuki Kanda, Zhong Meng, Liang Lu 0001, Yashesh Gaur, Xiaofei Wang 0009, Zhuo Chen 0006, Takuya Yoshioka, 
Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR.
ICASSP2021
Zhong Meng, Naoyuki Kanda, Yashesh Gaur, Sarangarajan Parthasarathy, Eric Sun, Liang Lu 0001, Xie Chen 0001, Jinyu Li 0001, Yifan Gong 0001, 
Internal Language Model Training for Domain-Adaptive End-To-End Speech Recognition.
ICASSP2021
Eric Sun, Liang Lu 0001, Zhong Meng, Yifan Gong 0001, 
Sequence-Level Self-Teaching Regularization.
Interspeech2021
Liang Lu 0001, Naoyuki Kanda, Jinyu Li 0001, Yifan Gong 0001, 
Streaming Multi-Talker Speech Recognition with Joint Speaker Identification.
Interspeech2021
Liang Lu 0001, Zhong Meng, Naoyuki Kanda, Jinyu Li 0001, Yifan Gong 0001, 
On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer.
Interspeech2021
Zhong Meng, Yu Wu 0012, Naoyuki Kanda, Liang Lu 0001, Xie Chen 0001, Guoli Ye, Eric Sun, Jinyu Li 0001, Yifan Gong 0001, 
Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition.
ICASSP2020
Zhuo Chen 0006, Takuya Yoshioka, Liang Lu 0001, Tianyan Zhou, Zhong Meng, Yi Luo 0004, Jian Wu 0027, Xiong Xiao, Jinyu Li 0001, 
Continuous Speech Separation: Dataset and Analysis.
ICASSP2020
Hirofumi Inaguma, Yashesh Gaur, Liang Lu 0001, Jinyu Li 0001, Yifan Gong 0001, 
Minimum Latency Training Strategies for Streaming Sequence-to-Sequence ASR.
Interspeech2020
Chengyi Wang 0002, Yu Wu 0012, Yujiao Du, Jinyu Li 0001, Shujie Liu 0001, Liang Lu 0001, Shuo Ren, Guoli Ye, Sheng Zhao, Ming Zhou 0001, 
Semantic Mask for Transformer Based End-to-End Speech Recognition.
Interspeech2020
Chengyi Wang 0002, Yu Wu 0012, Liang Lu 0001, Shujie Liu 0001, Jinyu Li 0001, Guoli Ye, Ming Zhou 0001, 
Low Latency End-to-End Streaming Speech Recognition with a Scout Network.
Interspeech2020
Liang Lu 0001, Changliang Liu, Jinyu Li 0001, Yifan Gong 0001, 
Exploring Transformers for Large-Scale Speech Recognition.
Interspeech2020
Jeremy H. M. Wong, Yashesh Gaur, Rui Zhao 0017, Liang Lu 0001, Eric Sun, Jinyu Li 0001, Yifan Gong 0001, 
Combination of End-to-End and Hybrid Models for Speech Recognition.
ICASSP2019
Jinyu Li 0001, Liang Lu 0001, Changliang Liu, Yifan Gong 0001, 
Improving Layer Trajectory LSTM with Future Context Frames.
Interspeech2019
Liang Lu 0001, Eric Sun, Yifan Gong 0001, 
Self-Teaching Networks.
ICASSP2018
Kalpesh Krishna, Liang Lu 0001, Kevin Gimpel, Karen Livescu, 
A Study of All-Convolutional Encoders for Connectionist Temporal Classification.
ICASSP2017
Liang Lu 0001, Michelle Guo, Steve Renals, 
Knowledge distillation for small-footprint highway networks.
Interspeech2017
Liang Lu 0001, Lingpeng Kong, Chris Dyer, Noah A. Smith, 
Multitask Learning with CTC and Segmental CRF for Speech Recognition.
Interspeech2017
Shubham Toshniwal, Hao Tang 0002, Liang Lu 0001, Karen Livescu, 
Multitask Learning with Low-Level Auxiliary Tasks for Encoder-Decoder Based Speech Recognition.
TASLP2022
Ziyi Xu, Maximilian Strake, Tim Fingscheidt, 
Deep Noise Suppression Maximizing Non-Differentiable PESQ Mediated by a Non-Intrusive PESQNet.
ICASSP2022
Jan Franzen, Tim Fingscheidt, 
Deep Residual Echo Suppression and Noise Reduction: A Multi-Input FCRN Approach in a Hybrid Speech Enhancement System.
ICASSP2021
Jan Franzen, Ernst Seidel, Tim Fingscheidt, 
AEC in A Netshell: on Target and Topology Choices for FCRN Acoustic Echo Cancellation.
Interspeech2021
Timo Lohrenz, Zhengyang Li, Tim Fingscheidt, 
Multi-Encoder Learning and Stream Fusion for Transformer-Based End-to-End Automatic Speech Recognition.
Interspeech2021
Ernst Seidel, Jan Franzen, Maximilian Strake, Tim Fingscheidt, 
Y2-Net FCRN for Acoustic Echo and Noise Suppression.
Interspeech2021
Ziyi Xu, Maximilian Strake, Tim Fingscheidt, 
Deep Noise Suppression with Non-Intrusive PESQNet Supervision Enabling the Use of Real Training Data.
ICASSP2020
Jan Baumann, Timo Lohrenz, Alexander Roy, Tim Fingscheidt, 
Beyond the Dcase 2017 Challenge on Rare Sound Event Detection: A Proposal for a More Realistic Training and Test Framework.
ICASSP2020
Maximilian Strake, Bruno Defraene, Kristoff Fluyt, Wouter Tirry, Tim Fingscheidt, 
Fully Convolutional Recurrent Networks for Speech Enhancement.
ICASSP2020
Ziyi Xu, Samy Elshamy, Tim Fingscheidt, 
Using Separate Losses for Speech and Noise in Mask-Based Speech Enhancement.
Interspeech2020
Timo Lohrenz, Tim Fingscheidt, 
BLSTM-Driven Stream Fusion for Automatic Speech Recognition: Novel Methods and a Multi-Size Window Fusion Example.
Interspeech2020
Maximilian Strake, Bruno Defraene, Kristoff Fluyt, Wouter Tirry, Tim Fingscheidt, 
INTERSPEECH 2020 Deep Noise Suppression Challenge: A Fully Convolutional Recurrent Network (FCRN) for Joint Dereverberation and Denoising.
TASLP2019
Johannes Abel, Tim Fingscheidt, 
Sinusoidal-Based Lowband Synthesis for Artificial Speech Bandwidth Extension.
TASLP2019
Ziyue Zhao, Huijun Liu 0001, Tim Fingscheidt, 
Convolutional Neural Networks to Enhance Coded Speech.
TASLP2018
Samy Elshamy, Nilesh Madhu, Wouter Tirry, Tim Fingscheidt, 
DNN-Supported Speech Enhancement With Cepstral Estimation of Both Excitation and Envelope.
ICASSP2018
Johannes Abel, Maximilian Strake, Tim Fingscheidt, 
A Simple Cepstral Domain DNN Approach to Artificial Speech Bandwidth Extension.
ICASSP2018
Patrick Meyer, Rolf Jongebloed, Tim Fingscheidt, 
Multichannel Speaker Activity Detection for Meetings.
ICASSP2018
Ziyi Xu, Samy Elshamy, Tim Fingscheidt, 
A Priori SNR Estimation Using Discriminative Non-Negative Matrix Factorization.
Interspeech2018
Patrick Meyer, Eric Buschermöhle, Tim Fingscheidt, 
What Do Classifiers Actually Learn? a Case Study on Emotion Recognition Datasets.
TASLP2017
Johannes Abel, Magdalena Kaniewska, Cyril Guillaume, Wouter Tirry, Tim Fingscheidt, 
An Instrumental Quality Measure for Artificially Bandwidth-Extended Speech Signals.
Interspeech2017
Jan Franzen, Tim Fingscheidt, 
A Delay-Flexible Stereo Acoustic Echo Cancellation for DFT-Based In-Car Communication (ICC) Systems.
TASLP2022
Mengzhe Geng, Xurong Xie, Zi Ye, Tianzi Wang, Guinan Li, Shujie Hu, Xunying Liu, Helen Meng, 
Speaker Adaptation Using Spectro-Temporal Deep Features for Dysarthric and Elderly Speech Recognition.
TASLP2022
Shoukang Hu, Xurong Xie, Mingyu Cui, Jiajun Deng, Shansong Liu, Jianwei Yu, Mengzhe Geng, Xunying Liu, Helen Meng, 
Neural Architecture Search for LF-MMI Trained Time Delay Neural Networks.
Interspeech2022
Mingyu Cui, Jiajun Deng, Shoukang Hu, Xurong Xie, Tianzi Wang, Shujie Hu, Mengzhe Geng, Boyang Xue, Xunying Liu, Helen Meng, 
Two-pass Decoding and Cross-adaptation Based System Combination of End-to-end Conformer and Hybrid TDNN ASR Systems.
Interspeech2022
Jiajun Deng, Xurong Xie, Tianzi Wang, Mingyu Cui, Boyang Xue, Zengrui Jin, Mengzhe Geng, Guinan Li, Xunying Liu, Helen Meng, 
Confidence Score Based Conformer Speaker Adaptation for Speech Recognition.
Interspeech2022
Jin Li, Rongfeng Su, Xurong Xie, Lan Wang, Nan Yan, 
A Multi-level Acoustic Feature Extraction Framework for Transformer Based End-to-End Speech Recognition.
TASLP2021
Shoukang Hu, Xurong Xie, Shansong Liu, Jianwei Yu, Zi Ye, Mengzhe Geng, Xunying Liu, Helen Meng, 
Bayesian Learning of LF-MMI Trained Time Delay Neural Networks for Speech Recognition.
TASLP2021
Shansong Liu, Mengzhe Geng, Shoukang Hu, Xurong Xie, Mingyu Cui, Jianwei Yu, Xunying Liu, Helen Meng, 
Recent Progress in the CUHK Dysarthric Speech Recognition System.
TASLP2021
Xurong Xie, Xunying Liu, Tan Lee, Lan Wang, 
Bayesian Learning for Deep Neural Network Adaptation.
ICASSP2021
Shoukang Hu, Xurong Xie, Shansong Liu, Mingyu Cui, Mengzhe Geng, Xunying Liu, Helen Meng, 
Neural Architecture Search for LF-MMI Trained Time Delay Neural Networks.
ICASSP2021
Zi Ye, Shoukang Hu, Jinchao Li, Xurong Xie, Mengzhe Geng, Jianwei Yu, Junhao Xu, Boyang Xue, Shansong Liu, Xunying Liu, Helen Meng, 
Development of the Cuhk Elderly Speech Recognition System for Neurocognitive Disorder Detection Using the Dementiabank Corpus.
Interspeech2021
Jiajun Deng, Fabian Ritter Gutierrez, Shoukang Hu, Mengzhe Geng, Xurong Xie, Zi Ye, Shansong Liu, Jianwei Yu, Xunying Liu, Helen Meng, 
Bayesian Parametric and Architectural Domain Adaptation of LF-MMI Trained TDNNs for Elderly and Dysarthric Speech Recognition.
Interspeech2021
Mengzhe Geng, Shansong Liu, Jianwei Yu, Xurong Xie, Shoukang Hu, Zi Ye, Zengrui Jin, Xunying Liu, Helen Meng, 
Spectro-Temporal Deep Features for Disordered Speech Assessment and Recognition.
Interspeech2021
Zengrui Jin, Mengzhe Geng, Xurong Xie, Jianwei Yu, Shansong Liu, Xunying Liu, Helen Meng, 
Adversarial Data Augmentation for Disordered Speech Recognition.
Interspeech2021
Xurong Xie, Rukiye Ruzi, Xunying Liu, Lan Wang, 
Variational Auto-Encoder Based Variability Encoding for Dysarthric Speech Recognition.
Interspeech2020
Mengzhe Geng, Xurong Xie, Shansong Liu, Jianwei Yu, Shoukang Hu, Xunying Liu, Helen Meng, 
Investigation of Data Augmentation Techniques for Disordered Speech Recognition.
Interspeech2020
Shansong Liu, Xurong Xie, Jianwei Yu, Shoukang Hu, Mengzhe Geng, Rongfeng Su, Shi-Xiong Zhang, Xunying Liu, Helen Meng, 
Exploiting Cross-Domain Visual Feature Generation for Disordered Speech Recognition.
ICASSP2019
Shoukang Hu, Max W. Y. Lam, Xurong Xie, Shansong Liu, Jianwei Yu, Xixin Wu, Xunying Liu, Helen Meng, 
Bayesian and Gaussian Process Neural Networks for Large Vocabulary Continuous Speech Recognition.
ICASSP2019
Xurong Xie, Xunying Liu, Tan Lee, Shoukang Hu, Lan Wang, 
BLHUC: Bayesian Learning of Hidden Unit Contributions for Deep Neural Network Speaker Adaptation.
Interspeech2019
Shoukang Hu, Xurong Xie, Shansong Liu, Max W. Y. Lam, Jianwei Yu, Xixin Wu, Xunying Liu, Helen Meng, 
LF-MMI Training of Bayesian and Gaussian Process Time Delay Neural Networks for Speech Recognition.
Interspeech2019
Xurong Xie, Xunying Liu, Tan Lee, Lan Wang, 
Fast DNN Acoustic Model Speaker Adaptation by Learning Hidden Unit Contribution Features.
ICASSP2022
Chao Xie, Yi-Chiao Wu, Patrick Lumban Tobing, Wen-Chin Huang, Tomoki Toda, 
Direct Noisy Speech Modeling for Noisy-To-Noisy Voice Conversion.
Interspeech2022
Reo Yoneyama, Yi-Chiao Wu, Tomoki Toda, 
Unified Source-Filter GAN with Harmonic-plus-Noise Source Excitation Generation.
TASLP2021
Wen-Chin Huang, Tomoki Hayashi, Yi-Chiao Wu, Hirokazu Kameoka, Tomoki Toda, 
Pretraining Techniques for Sequence-to-Sequence Voice Conversion.
TASLP2021
Yi-Chiao Wu, Tomoki Hayashi, Takuma Okamoto, Hisashi Kawai, Tomoki Toda, 
Quasi-Periodic Parallel WaveGAN: A Non-Autoregressive Raw Waveform Generative Model With Pitch-Dependent Dilated Convolution Neural Network.
TASLP2021
Yi-Chiao Wu, Tomoki Hayashi, Patrick Lumban Tobing, Kazuhiro Kobayashi, Tomoki Toda, 
Quasi-Periodic WaveNet: An Autoregressive Raw Waveform Generative Model With Pitch-Dependent Dilated Convolution Neural Network.
ICASSP2021
Wen-Chin Huang, Yi-Chiao Wu, Tomoki Hayashi, 
Any-to-One Sequence-to-Sequence Voice Conversion Using Self-Supervised Discrete Speech Representations.
ICASSP2021
Kazuhiro Kobayashi, Wen-Chin Huang, Yi-Chiao Wu, Patrick Lumban Tobing, Tomoki Hayashi, Tomoki Toda, 
Crank: An Open-Source Software for Nonparallel Voice Conversion Based on Vector-Quantized Variational Autoencoder.
Interspeech2021
Yi-Chiao Wu, Cheng-Hung Hu, Hung-Shin Lee, Yu-Huai Peng, Wen-Chin Huang, Yu Tsao 0001, Hsin-Min Wang, Tomoki Toda, 
Relational Data Selection for Data Augmentation of Speaker-Dependent Multi-Band MelGAN Vocoder.
Interspeech2021
Reo Yoneyama, Yi-Chiao Wu, Tomoki Toda, 
Unified Source-Filter GAN: Unified Source-Filter Network Based On Factorization of Quasi-Periodic Parallel WaveGAN.
ICASSP2020
Patrick Lumban Tobing, Yi-Chiao Wu, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda, 
Efficient Shallow Wavenet Vocoder Using Multiple Samples Output Based on Laplacian Distribution and Linear Prediction.
Interspeech2020
Wen-Chin Huang, Tomoki Hayashi, Yi-Chiao Wu, Hirokazu Kameoka, Tomoki Toda, 
Voice Transformer Network: Sequence-to-Sequence Voice Conversion Using Transformer with Text-to-Speech Pretraining.
Interspeech2020
Patrick Lumban Tobing, Tomoki Hayashi, Yi-Chiao Wu, Kazuhiro Kobayashi, Tomoki Toda, 
Cyclic Spectral Modeling for Unsupervised Unit Discovery into Voice Conversion with Excitation and Waveform Modeling.
Interspeech2020
Yi-Chiao Wu, Tomoki Hayashi, Takuma Okamoto, Hisashi Kawai, Tomoki Toda, 
Quasi-Periodic Parallel WaveGAN Vocoder: A Non-Autoregressive Pitch-Dependent Dilated Convolution Model for Parametric Speech Generation.
Interspeech2020
Yi-Chiao Wu, Patrick Lumban Tobing, Kazuki Yasuhara, Noriyuki Matsunaga, Yamato Ohtani, Tomoki Toda, 
A Cyclical Post-Filtering Approach to Mismatch Refinement of Neural Vocoder for Text-to-Speech Systems.
ICASSP2019
Patrick Lumban Tobing, Yi-Chiao Wu, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda, 
Voice Conversion with Cyclic Recurrent Neural Network and Fine-tuned Wavenet Vocoder.
Interspeech2019
Wen-Chin Huang, Yi-Chiao Wu, Chen-Chou Lo, Patrick Lumban Tobing, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda, Yu Tsao 0001, Hsin-Min Wang, 
Investigation of F0 Conditioning and Fully Convolutional Networks in Variational Autoencoder Based Voice Conversion.
Interspeech2019
Patrick Lumban Tobing, Yi-Chiao Wu, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda, 
Non-Parallel Voice Conversion with Cyclic Variational Autoencoder.
Interspeech2019
Yi-Chiao Wu, Tomoki Hayashi, Patrick Lumban Tobing, Kazuhiro Kobayashi, Tomoki Toda, 
Quasi-Periodic WaveNet Vocoder: A Pitch Dependent Dilated Convolution Model for Parametric Speech Generation.
Interspeech2018
Yu-Huai Peng, Hsin-Te Hwang, Yi-Chiao Wu, Yu Tsao 0001, Hsin-Min Wang, 
Exemplar-Based Spectral Detail Compensation for Voice Conversion.
Interspeech2018
Yi-Chiao Wu, Kazuhiro Kobayashi, Tomoki Hayashi, Patrick Lumban Tobing, Tomoki Toda, 
Collapsed Speech Segment Detection and Suppression for WaveNet Vocoder.
ICASSP2022
Shun Lei, Yixuan Zhou, Liyang Chen, Zhiyong Wu 0001, Shiyin Kang, Helen Meng, 
Towards Expressive Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis.
ICASSP2022
Xintao Zhao, Feng Liu, Changhe Song, Zhiyong Wu 0001, Shiyin Kang, Deyi Tuo, Helen Meng, 
Disentangling Content and Fine-Grained Prosody Information Via Hybrid ASR Bottleneck Features for Voice Conversion.
Interspeech2022
Jie Chen, Changhe Song, Deyi Tuo, Xixin Wu, Shiyin Kang, Zhiyong Wu 0001, Helen Meng, 
Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information.
Interspeech2022
Shun Lei, Yixuan Zhou, Liyang Chen, Jiankun Hu, Zhiyong Wu 0001, Shiyin Kang, Helen Meng, 
Towards Multi-Scale Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis.
Interspeech2022
Shaohuan Zhou, Shun Lei, Weiya You, Deyi Tuo, Yuren You, Zhiyong Wu 0001, Shiyin Kang, Helen Meng, 
Towards Improving the Expressiveness of Singing Voice Synthesis with BERT Derived Semantic Information.
TASLP2021
Xixin Wu, Yuewen Cao, Hui Lu, Songxiang Liu, Shiyin Kang, Zhiyong Wu 0001, Xunying Liu, Helen Meng, 
Exemplar-Based Emotive Speech Synthesis.
ICASSP2021
Jie Wang, Yuren You, Feng Liu, Deyi Tuo, Shiyin Kang, Zhiyong Wu 0001, Helen Meng, 
The Huya Multi-Speaker and Multi-Style Speech Synthesis System for M2voc Challenge 2020.
Interspeech2021
Hui Lu, Zhiyong Wu 0001, Xixin Wu, Xu Li, Shiyin Kang, Xunying Liu, Helen Meng, 
VAENAR-TTS: Variational Auto-Encoder Based Non-AutoRegressive Text-to-Speech Synthesis.
Interspeech2021
Jie Wang, Jingbei Li, Xintao Zhao, Zhiyong Wu 0001, Shiyin Kang, Helen Meng, 
Adversarially Learning Disentangled Speech Representations for Robust Multi-Factor Voice Conversion.
ICASSP2020
Yuewen Cao, Songxiang Liu, Xixin Wu, Shiyin Kang, Peng Liu, Zhiyong Wu 0001, Xunying Liu, Dan Su 0002, Dong Yu 0001, Helen Meng, 
Code-Switched Speech Synthesis Using Bilingual Phonetic Posteriorgram with Only Monolingual Corpora.
ICASSP2020
Songxiang Liu, Disong Wang, Yuewen Cao, Lifa Sun, Xixin Wu, Shiyin Kang, Zhiyong Wu 0001, Xunying Liu, Dan Su 0002, Dong Yu 0001, Helen Meng, 
End-To-End Accent Conversion Without Using Native Utterances.
ICASSP2020
Jianwei Yu, Shi-Xiong Zhang, Jian Wu 0027, Shahram Ghorbani, Bo Wu, Shiyin Kang, Shansong Liu, Xunying Liu, Helen Meng, Dong Yu 0001, 
Audio-Visual Recognition of Overlapped Speech for the LRS2 Dataset.
Interspeech2020
Songxiang Liu, Yuewen Cao, Shiyin Kang, Na Hu, Xunying Liu, Dan Su 0002, Dong Yu 0001, Helen Meng, 
Transferring Source Style in Non-Parallel Voice Conversion.
Interspeech2020
Chengzhu Yu, Heng Lu 0004, Na Hu, Meng Yu 0003, Chao Weng, Kun Xu 0005, Peng Liu, Deyi Tuo, Shiyin Kang, Guangzhi Lei, Dan Su 0002, Dong Yu 0001, 
DurIAN: Duration Informed Attention Network for Speech Synthesis.
ICASSP2019
Hui Lu, Zhiyong Wu 0001, Runnan Li, Shiyin Kang, Jia Jia 0001, Helen Meng, 
A Compact Framework for Voice Conversion Using Wavenet Conditioned on Phonetic Posteriorgrams.
ICASSP2019
Mu Wang, Xixin Wu, Zhiyong Wu 0001, Shiyin Kang, Deyi Tuo, Guangzhi Li, Dan Su 0002, Dong Yu 0001, Helen Meng, 
Quasi-fully Convolutional Neural Network with Variational Inference for Speech Synthesis.
Interspeech2019
Dongyang Dai, Zhiyong Wu 0001, Shiyin Kang, Xixin Wu, Jia Jia 0001, Dan Su 0002, Dong Yu 0001, Helen Meng, 
Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-Trained BERT.
Interspeech2019
Shen Huang, Bojie Hu, Shan Huang, Pengfei Hu, Jian Kang 0006, Zhiqiang Lv, Jinghao Yan, Qi Ju, Shiyin Kang, Deyi Tuo, Guangzhi Li, Nurmemet Yolwas, 
Multimedia Simultaneous Translation System for Minority Language Communication with Mandarin.
Interspeech2019
Hui Lu, Zhiyong Wu 0001, Dongyang Dai, Runnan Li, Shiyin Kang, Jia Jia 0001, Helen Meng, 
One-Shot Voice Conversion with Global Speaker Embeddings.
ICASSP2018
Xixin Wu, Lifa Sun, Shiyin Kang, Songxiang Liu, Zhiyong Wu 0001, Xunying Liu, Helen Meng, 
Feature Based Adaptation for Speaking Style Synthesis.
ICASSP2022
Ke Chen 0021, Xingjian Du, Bilei Zhu, Zejun Ma, Taylor Berg-Kirkpatrick, Shlomo Dubnov, 
HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection.
ICASSP2022
Minglun Han, Linhao Dong, Zhenlin Liang, Meng Cai, Shiyu Zhou, Zejun Ma, Bo Xu 0002, 
Improving End-to-End Contextual Speech Recognition with Fine-Grained Contextual Knowledge Selection.
ICASSP2022
Shaoshi Ling, Chen Shen 0011, Meng Cai, Zejun Ma, 
Improving Pseudo-Label Training For End-To-End Speech Recognition Using Gradient Mask.
ICASSP2022
Yizhou Lu, Mingkun Huang, Xinghua Qu, Pengfei Wei, Zejun Ma, 
Language Adaptive Cross-Lingual Speech Representation Learning with Sparse Sharing Sub-Networks.
ICASSP2022
Chen Shen 0011, Yi Liu, Wenzhi Fan, Bin Wang, Shixue Wen, Yao Tian, Jun Zhang, Jingsheng Yang, Zejun Ma, 
The Volcspeech System for the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Challenge.
Interspeech2022
Zhiyun Fan, Zhenlin Liang, Linhao Dong, Yi Liu, Shiyu Zhou, Meng Cai, Jun Zhang, Zejun Ma, Bo Xu, 
Token-level Speaker Change Detection Using Speaker Difference and Speech Content via Continuous Integrate-and-fire.
Interspeech2022
Kaiqi Fu, Shaojun Gao, Xiaohai Tian, Wei Li 0012, Zejun Ma, 
Using Fluency Representation Learned from Sequential Raw Features for Improving Non-native Fluency Scoring.
Interspeech2022
Junfeng Hou, Jinkun Chen, Wanyu Li, Yufeng Tang, Jun Zhang, Zejun Ma, 
Bring dialogue-context into RNN-T for streaming ASR.
Interspeech2022
Yufei Liu, Rao Ma, Haihua Xu, Yi He, Zejun Ma, Weibin Zhang, 
Internal Language Model Estimation Through Explicit Context Vector Learning for Attention-based Encoder-decoder ASR.
Interspeech2022
Xiaohai Tian, Kaiqi Fu, Shaojun Gao, Yiwei Gu, Kai Wang, Wei Li, Zejun Ma, 
A Transfer and Multi-Task Learning based Approach for MOS Prediction.
Interspeech2022
Chao Wang, Zhonghao Li, Benlai Tang, Xiang Yin 0006, Yuan Wan, Yibiao Yu, Zejun Ma, 
Towards high-fidelity singing voice conversion with acoustic reference and contrastive predictive coding.
AAAI2022
Ke Chen 0021, Xingjian Du, Bilei Zhu, Zejun Ma, Taylor Berg-Kirkpatrick, Shlomo Dubnov, 
Zero-Shot Audio Source Separation through Query-Based Learning from Weakly-Labeled Data.
KDD2022
Xinghua Qu, Pengfei Wei, Mingyong Gao, Zhu Sun, Yew Soon Ong, Zejun Ma, 
Synthesising Audio Adversarial Examples for Automatic Speech Recognition.
ICASSP2021
Yongwei Gao, Xingjian Du, Bilei Zhu, Xiaoheng Sun, Wei Li 0012, Zejun Ma, 
An Hrnet-Blstm Model With Two-Stage Training For Singing Melody Extraction.
ICASSP2021
Yuanbo Hou, Yi Deng, Bilei Zhu, Zejun Ma, Dick Botteldooren, 
Rule-Embedded Network for Audio-Visual Voice Activity Detection in Live Musical Video Streams.
ICASSP2021
Zhonghao Li, Benlai Tang, Xiang Yin 0006, Yuan Wan, Ling Xu, Chen Shen 0011, Zejun Ma, 
PPG-Based Singing Voice Conversion with Adversarial Representation Learning.
ICASSP2021
Junjie Pan, Lin Wu, Xiang Yin 0006, Pengfei Wu, Chenchang Xu, Zejun Ma, 
A Chapter-Wise Understanding System for Text-To-Speech in Chinese Novels.
ICASSP2021
Yao Tian, Haitao Yao, Meng Cai, Yaming Liu, Zejun Ma, 
Improving RNN Transducer Modeling for Small-Footprint Keyword Spotting.
Interspeech2021
Xianzhao Chen, Hao Ni, Yi He, Kang Wang, Zejun Ma, Zongxia Xie, 
Emitting Word Timings with HMM-Free End-to-End System in Automatic Speech Recognition.
Interspeech2021
Yuanbo Hou, Zhesong Yu, Xia Liang, Xingjian Du, Bilei Zhu, Zejun Ma, Dick Botteldooren, 
Attention-Based Cross-Modal Fusion for Audio-Visual Voice Activity Detection in Musical Video Streams.
ICASSP2022
Loren Lugosch, Tatiana Likhomanenko, Gabriel Synnaeve, Ronan Collobert, 
Pseudo-Labeling for Massively Multilingual Speech Recognition.
ICASSP2022
Vineel Pratap, Qiantong Xu, Tatiana Likhomanenko, Gabriel Synnaeve, Ronan Collobert, 
Word Order does not Matter for Speech Recognition.
ICASSP2021
Chaitanya Talnikar, Tatiana Likhomanenko, Ronan Collobert, Gabriel Synnaeve, 
Joint Masked CPC And CTC Training For ASR.
ICASSP2021
Qiantong Xu, Alexei Baevski, Tatiana Likhomanenko, Paden Tomasello, Alexis Conneau, Ronan Collobert, Gabriel Synnaeve, Michael Auli, 
Self-Training and Pre-Training are Complementary for Speech Recognition.
Interspeech2021
Wei-Ning Hsu, Anuroop Sriram, Alexei Baevski, Tatiana Likhomanenko, Qiantong Xu, Vineel Pratap, Jacob Kahn, Ann Lee 0001, Ronan Collobert, Gabriel Synnaeve, Michael Auli, 
Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training.
Interspeech2021
Tatiana Likhomanenko, Qiantong Xu, Jacob Kahn, Gabriel Synnaeve, Ronan Collobert, 
slimIPL: Language-Model-Free Iterative Pseudo-Labeling.
Interspeech2021
Tatiana Likhomanenko, Qiantong Xu, Vineel Pratap, Paden Tomasello, Jacob Kahn, Gilad Avidov, Ronan Collobert, Gabriel Synnaeve, 
Rethinking Evaluation in ASR: Are Our Models Robust Enough?
ICASSP2020
Jacob Kahn, Morgane Rivière, Weiyi Zheng, Evgeny Kharitonov, Qiantong Xu, Pierre-Emmanuel Mazaré, Julien Karadayi, Vitaliy Liptchinsky, Ronan Collobert, Christian Fuegen, Tatiana Likhomanenko, Gabriel Synnaeve, Armand Joulin, Abdelrahman Mohamed, Emmanuel Dupoux, 
Libri-Light: A Benchmark for ASR with Limited or No Supervision.
ICASSP2020
Andros Tjandra, Chunxi Liu, Frank Zhang 0001, Xiaohui Zhang, Yongqiang Wang 0005, Gabriel Synnaeve, Satoshi Nakamura 0001, Geoffrey Zweig, 
DEJA-VU: Double Feature Presentation and Iterated Loss in Deep Transformer Networks.
Interspeech2020
Alexandre Défossez, Gabriel Synnaeve, Yossi Adi, 
Real Time Speech Enhancement in the Waveform Domain.
Interspeech2020
Da-Rong Liu, Chunxi Liu, Frank Zhang 0001, Gabriel Synnaeve, Yatharth Saraf, Geoffrey Zweig, 
Contextualizing ASR Lattice Rescoring with Hybrid Pointer Network Language Model.
Interspeech2020
Vineel Pratap, Anuroop Sriram, Paden Tomasello, Awni Y. Hannun, Vitaliy Liptchinsky, Gabriel Synnaeve, Ronan Collobert, 
Massively Multilingual ASR: 50 Languages, 1 Model, 1 Billion Parameters.
Interspeech2020
Vineel Pratap, Qiantong Xu, Jacob Kahn, Gilad Avidov, Tatiana Likhomanenko, Awni Y. Hannun, Vitaliy Liptchinsky, Gabriel Synnaeve, Ronan Collobert, 
Scaling Up Online Speech Recognition Using ConvNets.
Interspeech2020
Vineel Pratap, Qiantong Xu, Anuroop Sriram, Gabriel Synnaeve, Ronan Collobert, 
MLS: A Large-Scale Multilingual Dataset for Speech Research.
Interspeech2020
Qiantong Xu, Tatiana Likhomanenko, Jacob Kahn, Awni Y. Hannun, Gabriel Synnaeve, Ronan Collobert, 
Iterative Pseudo-Labeling for Speech Recognition.
ICML2020
Ronan Collobert, Awni Y. Hannun, Gabriel Synnaeve, 
Word-Level Speech Recognition With a Letter to Word Encoder.
ICASSP2019
Yossi Adi, Neil Zeghidour, Ronan Collobert, Nicolas Usunier, Vitaliy Liptchinsky, Gabriel Synnaeve, 
To Reverse the Gradient or Not: an Empirical Comparison of Adversarial and Multi-task Learning in Speech Recognition.
ICASSP2019
Vineel Pratap, Awni Y. Hannun, Qiantong Xu, Jeff Cai, Jacob Kahn, Gabriel Synnaeve, Vitaliy Liptchinsky, Ronan Collobert, 
Wav2Letter++: A Fast Open-source Speech Recognition System.
Interspeech2019
Tatiana Likhomanenko, Gabriel Synnaeve, Ronan Collobert, 
Who Needs Words? Lexicon-Free Speech Recognition.
ICASSP2018
Neil Zeghidour, Nicolas Usunier, Iasonas Kokkinos, Thomas Schatz, Gabriel Synnaeve, Emmanuel Dupoux, 
Learning Filterbanks from Raw Speech for Phone Recognition.
ICASSP2022
Tara N. Sainath, Yanzhang He, Arun Narayanan, Rami Botros, Weiran Wang, David Qiu, Chung-Cheng Chiu, Rohit Prabhavalkar, Alexander Gruenstein, Anmol Gulati, Bo Li 0028, David Rybach, Emmanuel Guzman, Ian McGraw, James Qin, Krzysztof Choromanski, Qiao Liang, Robert David, Ruoming Pang, Shuo-Yiin Chang, Trevor Strohman, W. Ronny Huang, Wei Han 0002, Yonghui Wu, Yu Zhang 0033, 
Improving The Latency And Quality Of Cascaded Encoders.
ICML2022
Chung-Cheng Chiu, James Qin, Yu Zhang, Jiahui Yu, Yonghui Wu, 
Self-supervised learning with random-projection quantizer for speech recognition.
ICASSP2021
Thibault Doutre, Wei Han 0002, Min Ma, Zhiyun Lu, Chung-Cheng Chiu, Ruoming Pang, Arun Narayanan, Ananya Misra, Yu Zhang 0033, Liangliang Cao, 
Improving Streaming Automatic Speech Recognition with Non-Streaming Model Distillation on Unsupervised Data.
ICASSP2021
Bo Li 0028, Anmol Gulati, Jiahui Yu, Tara N. Sainath, Chung-Cheng Chiu, Arun Narayanan, Shuo-Yiin Chang, Ruoming Pang, Yanzhang He, James Qin, Wei Han 0002, Qiao Liang, Yu Zhang 0033, Trevor Strohman, Yonghui Wu, 
A Better and Faster end-to-end Model for Streaming ASR.
ICASSP2021
Jiahui Yu, Chung-Cheng Chiu, Bo Li 0028, Shuo-Yiin Chang, Tara N. Sainath, Yanzhang He, Arun Narayanan, Wei Han 0002, Anmol Gulati, Yonghui Wu, Ruoming Pang, 
FastEmit: Low-Latency Streaming ASR with Sequence-Level Emission Regularization.
Interspeech2021
Thibault Doutre, Wei Han 0002, Chung-Cheng Chiu, Ruoming Pang, Olivier Siohan, Liangliang Cao, 
Bridging the Gap Between Streaming and Non-Streaming ASR Systems by Distilling Ensembles of CTC and RNN-T Models.
Interspeech2021
Edwin G. Ng, Chung-Cheng Chiu, Yu Zhang, William Chan, 
Pushing the Limits of Non-Autoregressive Speech Recognition.
Interspeech2021
Tara N. Sainath, Yanzhang He, Arun Narayanan, Rami Botros, Ruoming Pang, David Rybach, Cyril Allauzen, Ehsan Variani, James Qin, Quoc-Nam Le-The, Shuo-Yiin Chang, Bo Li 0028, Anmol Gulati, Jiahui Yu, Chung-Cheng Chiu, Diamantino Caseiro, Wei Li 0133, Qiao Liang, Pat Rondon, 
An Efficient Streaming Non-Recurrent On-Device End-to-End Model with Improvements to Rare-Word Modeling.
ICLR2021
Jiahui Yu, Wei Han 0002, Anmol Gulati, Chung-Cheng Chiu, Bo Li 0028, Tara N. Sainath, Yonghui Wu, Ruoming Pang, 
Dual-mode ASR: Unify and Improve Streaming ASR with Full-context Modeling.
ICASSP2020
Daniel S. Park, Yu Zhang 0033, Chung-Cheng Chiu, Youzheng Chen, Bo Li 0028, William Chan, Quoc V. Le, Yonghui Wu, 
Specaugment on Large Scale Datasets.
ICASSP2020
Tara N. Sainath, Yanzhang He, Bo Li 0028, Arun Narayanan, Ruoming Pang, Antoine Bruguier, Shuo-Yiin Chang, Wei Li 0133, Raziel Alvarez, Zhifeng Chen, Chung-Cheng Chiu, David Garcia, Alexander Gruenstein, Ke Hu, Anjuli Kannan, Qiao Liang, Ian McGraw, Cal Peyser, Rohit Prabhavalkar, Golan Pundak, David Rybach, Yuan Shangguan, Yash Sheth, Trevor Strohman, Mirkó Visontai, Yonghui Wu, Yu Zhang 0033, Ding Zhao, 
A Streaming On-Device End-To-End Model Surpassing Server-Side Conventional Model Quality and Latency.
ICASSP2020
Tara N. Sainath, Ruoming Pang, Ron J. Weiss, Yanzhang He, Chung-Cheng Chiu, Trevor Strohman, 
An Attention-Based Joint Acoustic and Text on-Device End-To-End Model.
Interspeech2020
Anmol Gulati, James Qin, Chung-Cheng Chiu, Niki Parmar, Yu Zhang 0033, Jiahui Yu, Wei Han 0002, Shibo Wang, Zhengdong Zhang, Yonghui Wu, Ruoming Pang, 
Conformer: Convolution-augmented Transformer for Speech Recognition.
Interspeech2020
Wei Han 0002, Zhengdong Zhang, Yu Zhang 0033, Jiahui Yu, Chung-Cheng Chiu, James Qin, Anmol Gulati, Ruoming Pang, Yonghui Wu, 
ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context.
Interspeech2020
Wei Li 0133, James Qin, Chung-Cheng Chiu, Ruoming Pang, Yanzhang He, 
Parallel Rescoring with Transformer for Streaming On-Device Speech Recognition.
Interspeech2020
Daniel S. Park, Yu Zhang 0033, Ye Jia, Wei Han 0002, Chung-Cheng Chiu, Bo Li 0028, Yonghui Wu, Quoc V. Le, 
Improved Noisy Student Training for Automatic Speech Recognition.
Interspeech2019
Daniel S. Park, William Chan, Yu Zhang 0033, Chung-Cheng Chiu, Barret Zoph, Ekin D. Cubuk, Quoc V. Le, 
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition.
Interspeech2019
Tara N. Sainath, Ruoming Pang, David Rybach, Yanzhang He, Rohit Prabhavalkar, Wei Li 0133, Mirkó Visontai, Qiao Liang, Trevor Strohman, Yonghui Wu, Ian McGraw, Chung-Cheng Chiu, 
Two-Pass End-to-End Speech Recognition.
ICASSP2018
Chung-Cheng Chiu, Tara N. Sainath, Yonghui Wu, Rohit Prabhavalkar, Patrick Nguyen, Zhifeng Chen, Anjuli Kannan, Ron J. Weiss, Kanishka Rao, Ekaterina Gonina, Navdeep Jaitly, Bo Li 0028, Jan Chorowski, Michiel Bacchiani, 
State-of-the-Art Speech Recognition with Sequence-to-Sequence Models.
ICASSP2018
Dieterich Lawson, Chung-Cheng Chiu, George Tucker, Colin Raffel, Kevin Swersky, Navdeep Jaitly, 
Learning Hard Alignments with Variational Inference.
SpeechComm2022
Hemant Kumar Kathania, Sudarsana Reddy Kadiri, Paavo Alku, Mikko Kurimo, 
A formant modification method for improved ASR of children's speech.
Interspeech2022
Yaroslav Getman, Ragheb Al-Ghezi, Katja Voskoboinik, Tamás Grósz, Mikko Kurimo, Giampiero Salvi, Torbjørn Svendsen, Sofia Strömbergsson, 
wav2vec2-based Speech Rating System for Children with Speech Sound Disorder.
Interspeech2022
Georgios Karakasidis, Tamás Grósz, Mikko Kurimo, 
Comparison and Analysis of New Curriculum Criteria for End-to-End ASR.
Interspeech2022
Aku Rouhe, Anja Virkkunen, Juho Leinonen 0002, Mikko Kurimo, 
Low Resource Comparison of Attention-based and Hybrid ASR Exploiting wav2vec 2.0.
Interspeech2021
Ragheb Al-Ghezi, Yaroslav Getman, Aku Rouhe, Raili Hildén, Mikko Kurimo, 
Self-Supervised End-to-End ASR for Low Resource L2 Swedish.
ICASSP2020
Hemant Kumar Kathania, Sudarsana Reddy Kadiri, Paavo Alku, Mikko Kurimo, 
Study of Formant Modification for Children ASR.
ICASSP2020
Aku Rouhe, Tuomas Kaseva, Mikko Kurimo, 
Speaker-Aware Training of Attention-Based End-to-End Speech Recognition Using Neural Speaker Embeddings.
Interspeech2020
Abhilash Jain, Aku Rouhe, Stig-Arne Grönroos, Mikko Kurimo, 
Finnish ASR with Deep Transformer Models.
Interspeech2020
Hemant Kumar Kathania, Mittul Singh, Tamás Grósz, Mikko Kurimo, 
Data Augmentation Using Prosody and False Starts to Recognize Non-Native Children's Speech.
Interspeech2020
Katri Leino, Juho Leinonen 0002, Mittul Singh, Sami Virpioja, Mikko Kurimo, 
FinChat: Corpus and Evaluation Setup for Finnish Chat Conversations on Everyday Topics.
Interspeech2020
Matias Lindgren, Tommi Jauhiainen, Mikko Kurimo, 
Releasing a Toolkit and Comparing the Performance of Language Embeddings Across Various Spoken Language Identification Datasets.
Interspeech2019
Reima Karhila, Anna-Riikka Smolander, Sari Ylinen, Mikko Kurimo, 
Transparent Pronunciation Scoring Using Articulatorily Weighted Phoneme Edit Distance.
Interspeech2019
Mittul Singh, Sami Virpioja, Peter Smit, Mikko Kurimo, 
Subword RNNLM Approximations for Out-Of-Vocabulary Keyword Search.
Interspeech2018
Aku Rouhe, Reima Karhila, Aija Elg, Minnaleena Toivola, Peter Smit, Anna-Riikka Smolander, Mikko Kurimo, 
Captaina: Integrated Pronunciation Practice and Data Collection Portal.
TASLP2017
Seppo Enarvi, Peter Smit, Sami Virpioja, Mikko Kurimo, 
Automatic Speech Recognition With Very Large Conversational Finnish and Estonian Vocabularies.
ICASSP2017
Md. Akmal Haidar, Mikko Kurimo, 
LDA-based context dependent recurrent neural network language model using document-based topic distribution of words.
Interspeech2017
Reima Karhila, Sari Ylinen, Seppo Enarvi, Kalle J. Palomäki, Aleksander Nikulin, Olli Rantula, Vertti Viitanen, Krupakar Dhinakaran, Anna-Riikka Smolander, Heini Kallio, Katja Junttila, Maria Uther, Perttu Hämäläinen, Mikko Kurimo, 
SIAK - A Game for Foreign Language Pronunciation Learning.
Interspeech2017
André Mansikkaniemi, Peter Smit, Mikko Kurimo, 
Automatic Construction of the Finnish Parliament Speech Corpus.
Interspeech2017
Aku Rouhe, Reima Karhila, Peter Smit, Mikko Kurimo, 
Reading Validation for Pronunciation Evaluation in the Digitala Project.
Interspeech2017
Peter Smit, Sami Virpioja, Mikko Kurimo, 
Improved Subword Modeling for WFST-Based Speech Recognition.
Interspeech2022
Hira Dhamyal, Bhiksha Raj, Rita Singh, 
Positional Encoding for Capturing Modality Specific Cadence for Emotion Detection.
Interspeech2022
Raphaël Olivier, Bhiksha Raj, 
Recent improvements of ASR models in the face of adversarial attacks.
Interspeech2022
Francisco Teixeira, Alberto Abad, Bhiksha Raj, Isabel Trancoso, 
Towards End-to-End Private Automatic Speaker Recognition.
Interspeech2022
Muqiao Yang, Joseph Konan, David Bick, Anurag Kumar 0003, Shinji Watanabe 0001, Bhiksha Raj, 
Improving Speech Enhancement through Fine-Grained Speech Characteristics.
ICASSP2021
Raphaël Olivier, Bhiksha Raj, Muhammad Shah, 
High-Frequency Adversarial Defense for Speech and Audio.
ICASSP2021
Ali Shahin Shamsabadi, Francisco Sepúlveda Teixeira, Alberto Abad, Bhiksha Raj, Andrea Cavallaro, Isabel Trancoso, 
FoolHD: Fooling Speaker Identification by Highly Imperceptible Adversarial Disturbances.
Interspeech2021
Soham Deshmukh, Bhiksha Raj, Rita Singh, 
Improving Weakly Supervised Sound Event Detection with Self-Supervised Auxiliary Tasks.
Interspeech2021
Jiachen Lian, Aiswarya Vinod Kumar, Hira Dhamyal, Bhiksha Raj, Rita Singh, 
Masked Proxy Loss for Text-Independent Speaker Verification.
EMNLP2021
Raphaël Olivier, Bhiksha Raj, 
Sequential Randomized Smoothing for Adversarially Robust Speech Recognition.
Interspeech2020
Hira Dhamyal, Shahan Ali Memon, Bhiksha Raj, Rita Singh, 
The Phonetic Bases of Vocal Expressed Emotion: Natural versus Acted.
Interspeech2020
Felix Kreuk, Yossi Adi, Bhiksha Raj, Rita Singh, Joseph Keshet, 
Hide and Speak: Towards Deep Neural Networks for Speech Steganography.
TASLP2019
Annamaria Mesaros, Aleksandr Diment, Benjamin Elizalde, Toni Heittola, Emmanuel Vincent 0001, Bhiksha Raj, Tuomas Virtanen, 
Sound Event Detection in the DCASE 2017 Challenge.
ICASSP2019
Benjamin Elizalde, Shuayb Zarar, Bhiksha Raj, 
Cross Modal Audio Search and Retrieval with Joint Embeddings Based on Text and Audio.
ICASSP2019
Daanish Ali Khan, Saquib Razak, Bhiksha Raj, Rita Singh, 
Human Behaviour Recognition Using Wifi Channel State Information.
NeurIPS2019
Yandong Wen, Bhiksha Raj, Rita Singh, 
Face Reconstruction from Voice using Generative Adversarial Networks.
ICLR2019
Yandong Wen, Mahmoud Al Ismail, Weiyang Liu, Bhiksha Raj, Rita Singh, 
Disjoint Mapping Network for Cross-modal Matching of Voices and Faces.
ICASSP2018
Yang Gao, Rita Singh, Bhiksha Raj, 
Voice Impersonation Using Generative Adversarial Networks.
ICASSP2018
Yandong Wen, Tianyan Zhou, Rita Singh, Bhiksha Raj, 
A Corrective Learning Approach for Text-Independent Speaker Verification.
Interspeech2018
M. Joana Correia, Bhiksha Raj, Isabel Trancoso, Francisco Teixeira, 
Mining Multimodal Repositories for Speech Affecting Diseases.
ICASSP2017
Anurag Kumar 0003, Bhiksha Raj, Ndapandula Nakashole, 
Discovering sound concepts and acoustic relations in text.