Speech Rankings

A list of researchers in the area of speech ordered by the number of relevant publications, for the purpose of identifying potential academic supervisors. See here for details.
Generated at 2023-07-25 22:54:07, arguments: --year_start 2016 --year_end 2023 --author_start_year 1900 --exclude_venue SSW,ASRU,IWSLT,SLT --n_pubs 20 --rank_start 0 --rank_end 200 --output speech_rankings.html

#1  | Haizhou Li 0001 | Google Scholar   DBLP
VenuesInterspeech: 83ICASSP: 40TASLP: 14SpeechComm: 9
Years2022: 202021: 282020: 252019: 242018: 122017: 112016: 26
ISCA Sectionspoken term detection: 6speech synthesis: 5source separation: 4voice conversion and adaptation: 3speech signal characterization: 3speech technologies for code-switching in multilingual communities: 3language recognition: 3novel models and training methods for asr: 2speaker recognition: 2speech enhancement, bandwidth extension and hearing aids: 2speaker recognition evaluation: 2special session: 2resources and annotation of resources: 2asr: 1resource-constrained asr: 1target speaker detection, localization and separation: 1the first dicova challenge: 1spoken language understanding: 1self-supervision and semi-supervision for neural asr training: 1speech enhancement and intelligibility: 1robust speaker recognition: 1feature, embedding and neural architecture for speaker recognition: 1neural signals for spoken communication: 1the attacker’s perpective on automatic speaker verification: 1targeted source separation: 1speech in multimodality: 1the interspeech 2020 far field speaker verification challenge: 1speaker recognition challenges and applications: 1anti-spoofing and liveness detection: 1asr neural network architectures: 1cross/multi-lingual and code-switched speech recognition: 1the interspeech 2019 computational paralinguistics challenge (compare): 1the 2019 automatic speaker verification spoofing and countermeasures challenge: 1speaker recognition and anti-spoofing: 1speech processing and analysis: 1speaker and language recognition: 1speech and audio characterization and segmentation: 1neural waveform generation: 1the zero resource speech challenge 2019: 1speech and speaker recognition: 1speaker recognition and diarization: 1cross-lingual and multilingual asr: 1speech and singing production: 1prosody modeling and generation: 1voice conversion and speech synthesis: 1speaker verification: 1show and tell: 1source separation from monaural input: 1multimodal paralinguistics: 1short utterances speaker recognition: 1isca medal 2017 ceremony: 1voice conversion: 1source separation and voice activity detection: 1show & tell session: 1robust speaker recognition and anti-spoofing: 1automatic learning of representations: 1feature extraction and acoustic modeling using neural networks for asr: 1
IEEE Keywordspeaker recognition: 23speech recognition: 19speech synthesis: 12natural language processing: 12text analysis: 6speaker extraction: 5music: 4speaker embedding: 4pattern classification: 4emotion recognition: 4speech intelligibility: 3speech coding: 3anti spoofing: 3security of data: 3cepstral analysis: 3voice conversion: 3gaussian processes: 3spoken term detection: 3lyrics transcription: 2audio signal processing: 2singing voice separation: 2hearing: 2direction of arrival estimation: 2reverberation: 2convolutional neural nets: 2time frequency analysis: 2speech enhancement: 2pre training: 2tacotron: 2multi task learning: 2autoencoder: 2transfer learning: 2voice conversion (vc): 2cross lingual: 2computational linguistics: 2word processing: 2graph theory: 2tts: 2data augmentation: 2signal detection: 2time domain: 2signal representation: 2signal reconstruction: 2covariance matrices: 2speaker verification: 2probability: 2sparse representation: 2exemplar: 2mixture models: 2estimation theory: 2keyword spotting: 2singing skill evaluation: 1singing voice: 1singing information processing: 1lyrics synchronization: 1singing voice synthesis: 1target speaker extraction: 1filtering theory: 1scenario aware differentiated loss: 1 $general speech mixture$ : 1sparsely overlapped speech: 1multi modal: 1language translation: 1multilingual: 1selective auditory attention: 1cocktail party problem: 1grammars: 1globalphone: 1target language extraction: 1natural languages: 1music information retrieval: 1lyrics transcription of polyphonic music: 1array signal processing: 1doa estimation: 1beamforming: 1speaker localizer: 1multi scale frequency channel attention: 1text independent speaker verification: 1short utterance: 1automatic voice over: 1textual visual attention: 1rendering (computer graphics): 1visual text to speech: 1text detection: 1voice activity detection: 1lip speech synchronization: 1video signal processing: 1image fusion: 1self supervised speaker recognition: 1pseudo label selection: 1loss gated learning: 1unsupervised learning: 1time frequency attention: 1temporal convolutional network: 1energy distribution: 1prompt: 1multimodal: 1prosodic phrasing: 1self attention: 1phrase break prediction: 1mongolian speech synthesis: 1deep learning (artificial intelligence): 1morphological and phonological features: 1audio databases: 1frame and style reconstruction loss: 1expressive speech synthesis: 1target speaker verification: 1single and multi talker speaker verification: 1human computer interaction: 1automatic dialogue evaluation: 1holistic framework: 1sport: 1interactive systems: 1speech based user interfaces: 1self supervised learning: 1non parallel: 1decoding: 1context vector: 1text to speech (tts): 1personalized speech generation: 1language agnostic: 1graph neural network: 1syntax: 1synthetic speech detection: 1signal companding: 1multi stage: 1signal fusion: 1representation learning: 1image recognition: 1channel attention: 1speech emotion recognition: 1convolution: 1spectro temporal attention: 1adversarial training: 1disentangled feature learning: 1signal denoising: 1image classification: 1intent classification: 1linguistic embeddings: 1acoustic embed dings: 1emotional speech dataset: 1speech emotion recognition (ser): 1emotional voice conversion: 1inter singer measures: 1evaluation of singing quality: 1music theory motivated measures: 1evaluation by ranking: 1self organising feature maps: 1musical acoustics: 1multi scale: 1depth wise separable convolution: 1autoregressive processes: 1knowledge distillation: 1inference mechanisms: 1chains corpus: 1speaker characterization: 1whispered speech: 1vocal tract constriction: 1generalized countermeasures: 1synthetic attacks: 1asvspoof 2019: 1replay attacks: 1wavenet adaptation: 1singular value decomposition (svd): 1singular value decomposition: 1acoustic modeling: 1lyrics alignment: 1music genre: 1automatic speech recognition: 1speech bandwidth extension: 1multi scale fusion: 1sensor fusion: 1signal restoration: 1time domain analysis: 1independent language model: 1low resource asr: 1catastrophic forgetting.: 1fine tuning: 1end to end: 1text to speech: 1crosslingual word embedding: 1code switching: 1cross lingual embedding: 1language modelling: 1linguistics: 1code switch: 1band pass filters: 1asvspoof 2017: 1automatic speaker verification: 1spatial differentiation: 1iir filters: 1channel bank filters: 1speech separation: 1spectrum approximation loss: 1source separation: 1phonetic posteriorgram (ppg): 1average modeling approach (ama): 1rapid computation: 1total variability model: 1fusion: 1analytic phase: 1spoken language recognition: 1long time features: 1instantaneous frequency: 1unsupervised domain adaptation: 1domain adversarial training: 1laplacian eigenmaps: 1laplacian probabilistic latent semantic analysis: 1graph regularization: 1matrix algebra: 1topic modeling: 1data structures: 1topic segmentation: 1data reduction: 1frequency warping: 1residual compensation: 1interpolation: 1channel adaptation: 1channel prior estimation: 1probabilistic linear discriminant analysis: 1multi source speaker verification: 1pattern matching: 1pairwise learning: 1low resource speech processing: 1bottleneck features: 1feature adaptation: 1linear transform: 1temporal filtering: 1robust speech recognition: 1transforms: 1short duration utterance: 1content aware local variability: 1deep neural network (dnn): 1large vocabulary continuous speech recognition (lvcsr): 1under resourced languages: 1spoken term detection (std): 1automatic speech recognition (asr): 1pathological speech: 1regression analysis: 1acoustic signal detection: 1support vector machines: 1multiple kernel models: 1intelligibility: 1correlation structure feature: 1pllr: 1i vector: 1hierarchical framework: 1language identification: 1timbre: 1prosody: 1submodular optimization: 1active learning: 1phase: 1spoofing attack: 1high dimensional feature: 1counter measure: 1spoofing detection: 1error analysis: 1direction of arrival: 1mean square error methods: 1eigenvector clustering: 1spatial covariance: 1eigenvalues and eigenfunctions: 1pattern clustering: 1microphone arrays: 1expectation maximisation algorithm: 1expectation maximization: 1query processing: 1time series: 1dtw: 1partial matching: 1query by example: 1
Most Publications2021: 892022: 842020: 792019: 702010: 70

Affiliations
Chinese University of Hong Kong (Shenzhen), China
National University of Singapore, Department of Electrical and Computer Engineering, Singapore
Nanyang Technological University, Singapore (2006 - 2016)
Institute for Infocomm Research, A*STAR, Singapore (2003 - 2016)
University of New South Wales, Sydney, Australia (2011)
University of Eastern Finland, Kuopio, Finland (2009)
South China University of Technology, Guangzhou, China (PhD 1990)

SpeechComm2022 Kun Zhou, Berrak Sisman, Rui Liu 0008, Haizhou Li 0001
Emotional voice conversion: Theory, databases and ESD.

SpeechComm2022 Hongning Zhu, Kong Aik Lee, Haizhou Li 0001
Discriminative speaker embedding with serialized multi-layer multi-head attention.

TASLP2022 Chitralekha Gupta, Haizhou Li 0001, Masataka Goto, 
Deep Learning Approaches in Topics of Singing Information Processing.

TASLP2022 Zexu Pan, Meng Ge, Haizhou Li 0001
USEV: Universal Speaker Extraction With Visual Cue.

ICASSP2022 Marvin Borsdorf, Kevin Scheck, Haizhou Li 0001, Tanja Schultz, 
Experts Versus All-Rounders: Target Language Extraction for Multiple Target Languages.

ICASSP2022 Xiaoxue Gao, Chitralekha Gupta, Haizhou Li 0001
Genre-Conditioned Acoustic Models for Automatic Lyrics Transcription of Polyphonic Music.

ICASSP2022 Meng Ge, Chenglin Xu, Longbiao Wang, Eng Siong Chng, Jianwu Dang 0001, Haizhou Li 0001
L-SpEx: Localized Target Speaker Extraction.

ICASSP2022 Tianchi Liu 0004, Rohan Kumar Das, Kong Aik Lee, Haizhou Li 0001
MFA: TDNN with Multi-Scale Frequency-Channel Attention for Text-Independent Speaker Verification with Short Utterances.

ICASSP2022 Junchen Lu, Berrak Sisman, Rui Liu 0008, Mingyang Zhang 0003, Haizhou Li 0001
Visualtts: TTS with Accurate Lip-Speech Synchronization for Automatic Voice Over.

ICASSP2022 Ruijie Tao, Kong Aik Lee, Rohan Kumar Das, Ville Hautamäki, Haizhou Li 0001
Self-Supervised Speaker Recognition with Loss-Gated Learning.

ICASSP2022 Qiquan Zhang, Qi Song, Zhaoheng Ni, Aaron Nicolson, Haizhou Li 0001
Time-Frequency Attention for Monaural Speech Enhancement.

ICASSP2022 Jinming Zhao, Ruichen Li, Qin Jin, Xinchao Wang, Haizhou Li 0001
Memobert: Pre-Training Model with Prompt-Based Learning for Multimodal Emotion Recognition.

Interspeech2022 Rui Liu 0008, Berrak Sisman, Björn W. Schuller, Guanglai Gao, Haizhou Li 0001
Accurate Emotion Strength Assessment for Seen and Unseen Speech Based on Data-Driven Deep Learning.

Interspeech2022 Junyi Ao, Ziqiang Zhang, Long Zhou, Shujie Liu 0001, Haizhou Li 0001, Tom Ko, Lirong Dai 0001, Jinyu Li 0001, Yao Qian, Furu Wei, 
Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data.

Interspeech2022 Marvin Borsdorf, Kevin Scheck, Haizhou Li 0001, Tanja Schultz, 
Blind Language Separation: Disentangling Multilingual Cocktail Party Voices by Language.

Interspeech2022 Zongyang Du, Berrak Sisman, Kun Zhou, Haizhou Li 0001
Disentanglement of Emotional Style and Speaker Identity for Expressive Voice Conversion.

Interspeech2022 Zexu Pan, Meng Ge, Haizhou Li 0001
A Hybrid Continuity Loss to Reduce Over-Suppression for Time-domain Target Speaker Extraction.

Interspeech2022 Zeyang Song, Qi Liu, Qu Yang, Haizhou Li 0001
Knowledge distillation for In-memory keyword spotting model.

Interspeech2022 Rui Wang, Qibing Bai, Junyi Ao, Long Zhou, Zhixiang Xiong, Zhihua Wei, Yu Zhang 0006, Tom Ko, Haizhou Li 0001
LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT.

Interspeech2022 Qu Yang, Qi Liu, Haizhou Li 0001
Deep residual spiking neural network for keyword spotting in low-resource settings.

#2  | Shinji Watanabe 0001 | Google Scholar   DBLP
VenuesInterspeech: 79ICASSP: 50TASLP: 5ACL: 3NAACL: 3ICML: 2
Years2022: 392021: 342020: 152019: 242018: 142017: 82016: 8
ISCA Sectionspeech synthesis: 4non-autoregressive sequential modeling for speech processing: 4speaker diarization: 4low-resource asr development: 3spoken language understanding: 2spoken language processing: 2novel models and training methods for asr: 2asr: 2source separation: 2neural networks for language modeling: 2robust speech recognition: 2adjusting to speaker, accent, and domain: 2robust asr, and far-field/multi-talker asr: 1search/decoding algorithms for asr: 1streaming asr: 1speech enhancement and intelligibility: 1neural transducers, streaming asr and novel asr models: 1speech segmentation: 1adaptation, transfer learning, and distillation for asr: 1speech processing & measurement: 1single-channel and multi-channel speech enhancement: 1spoken dialogue systems and multimodality: 1tools, corpora and resources: 1streaming for asr/rnn transducers: 1acoustic event detection and acoustic scene classification: 1low-resource speech recognition: 1miscellanous topics in asr: 1emotion and sentiment analysis: 1topics in asr: 1cross/multi-lingual and code-switched asr: 1speech signal analysis and representation: 1target speaker detection, localization and separation: 1single-channel speech enhancement: 1asr neural network architectures and training: 1speaker embedding: 1noise robust and distant speech recognition: 1sequence-to-sequence speech recognition: 1asr for noisy and far-field speech: 1speaker recognition: 1speaker recognition evaluation: 1asr neural network training: 1spoken term detection, confidence measure, and end-to-end speech recognition: 1asr neural network architectures: 1speech and voice disorders: 1search methods for speech recognition: 1speech technologies for code-switching in multilingual communities: 1nn architectures for asr: 1language identification: 1sequence models for asr: 1end-to-end speech recognition: 1the first dihard speech diarization challenge: 1deep enhancement: 1recurrent neural models for asr: 1neural network acoustic models for asr: 1lexical and pronunciation modeling: 1noise robust speech recognition: 1far-field speech processing: 1spoken language understanding systems: 1source separation and spatial audio: 1robustness in speech processing: 1
IEEE Keywordspeech recognition: 45natural language processing: 15recurrent neural nets: 10speaker recognition: 9end to end speech recognition: 8decoding: 7end to end: 7speech enhancement: 6source separation: 6text analysis: 5encoding: 4language translation: 4pattern classification: 4end to end asr: 4speech separation: 4self supervised learning: 4speech synthesis: 4signal classification: 4transformer: 4connectionist temporal classification: 4speech coding: 4audio signal processing: 4microphone arrays: 4non autoregressive: 3autoregressive processes: 3attention: 3neural net architecture: 3convolutional neural nets: 3hidden markov models: 3speaker diarization: 2eend: 2public domain software: 2open source: 2graph theory: 2ctc: 2automatic speech recognition: 2rnn t: 2computational linguistics: 2sequence to sequence: 2cycle consistency: 2end to end speech translation: 2pattern clustering: 2self attention: 2gaussian processes: 2entropy: 2encoder decoder: 2joint ctc/attention: 2multiple microphone array: 2supervised learning: 2sound event detection: 2neural beamformer: 2signal detection: 2optimisation: 2iterative methods: 1inference mechanisms: 1eda: 1spoken language understanding: 1speech based user interfaces: 1multi speaker overlapped speech: 1gtc: 1wfst: 1misp challenge: 1audio visual systems: 1wake word spotting: 1audio visual: 1microphone array: 1ctc/attention speech recognition: 1fourier transforms: 1channel bank filters: 1self supervised speech representation: 1computer based training: 1voice conversion: 1sensor fusion: 1speech summarization: 1attention fusion: 1speech translation: 1rover: 1bic: 1acoustic unit discovery: 1unit based language model: 1hubert: 1interactive systems: 1asr: 1gtc t: 1transducer: 1linguistic annotation: 1re current neural network: 1sru++: 1code switched asr: 1bilingual asr: 1software packages: 1audio processing: 1open source toolkit: 1text to speech: 1python: 1self supervision: 1end to end speech processing: 1conformer: 1non autoregressive sequence generation: 1image sequences: 1conditional masked language model: 1multiprocessing systems: 1non autoregressive decoding: 1long sequence data: 1multitask learning: 1search problems: 1stochastic processes: 1continuous speech separation: 1long recording speech separation: 1dual path modeling: 1online processing: 1transforms: 1deep learning (artificial intelligence): 1noisy speech: 1signal denoising: 1audio recording: 1diarization: 1loudspeakers: 1uncertainty estimation: 1targetspeaker speech extraction: 1target speaker speech recognition: 1source localization: 1direction of arrival estimation: 1multi encoder multi array (mem array): 1multi encoder multi resolution (mem res): 1hierarchical attention network (han): 1permutation invariant training: 1end to end model: 1multi talker mixed speech recognition: 1knowledge distillation: 1curriculum learning: 1neural beamforming: 1overlapped speech recognition: 1reverberation: 1lightweight convolution: 1dynamic convolution: 1multi stream: 1two stage training: 1weakly supervised learning: 1target speech extraction: 1signal reconstruction: 1minimisation: 1streaming: 1voice activity detection: 1ctc greedy search: 1covariance matrix adaptation evolution strategy (cma es): 1multi objective optimization: 1deep neural network (dnn): 1evolutionary computation: 1genetic algorithm: 1pareto optimisation: 1cloud computing: 1parallel processing: 1softmax margin: 1sequence learning: 1discriminative training: 1attention models: 1beam search training: 1cold fusion: 1storage management: 1deep fusion: 1shallow fusion: 1language model: 1automatic speech recognition (asr): 1unpaired data: 1expert systems: 1vocabulary: 1low resource language: 1multilingual speech recognition: 1transfer learning: 1acoustic model: 1semi supervised learning: 1autoencoder: 1conditional restricted boltzmann machine: 1restricted boltzmann machine: 1unsupervised learning: 1weakly labeled data: 1acoustic modeling: 1kaldi: 1array signal processing: 1chime 5 challenge: 1robust speech recognition: 1error statistics: 1speech codecs: 1stream attention: 1end to end models: 1word processing: 1sub word modeling: 1multichannel end to end asr: 1speaker adaptation: 1attention based encoder decoder: 1hybrid attention/ctc: 1language identification: 1language independent architecture: 1multilingual asr: 1speaker independent multi talker speech separation: 1human computer interaction: 1cocktail party problem: 1deep clustering: 1hidden semi markov model (hsmm): 1polyphonic sound event detection (sed): 1hybrid model: 1recurrent neural network: 1long short term memory (lstm): 1duration control: 1multi task learning: 1chime 4: 1student teacher learning: 1distant talking asr: 1multi access systems: 1distance learning: 1embedding: 1clustering: 1probability: 1maximum likelihood estimation: 1long short term memory: 1recurrent neural network language model: 1minimum word error training: 1ssnn: 1i vector: 1sequence summary: 1adaptation: 1dnn: 1multichannel gmm: 1deep unfolding: 1markov random field: 1mixture models: 1markov processes: 1
Most Publications2022: 1342021: 1142019: 602020: 552018: 46

Affiliations
Carnegie Mellon University, Pittsburgh, PA, USA
Johns Hopkins University, Baltimore, MD, USA (former)
Mitsubishi Electric Research Laboratories, Cambridge, MA, USA (2012 - 2017)
NTT Communication Science Laboratories, Kyoto, Japan (2001 - 2011)
Waseda University, Tokyo, Japan (PhD 2006)

TASLP2022 Shota Horiguchi, Yusuke Fujita, Shinji Watanabe 0001, Yawen Xue, Paola García, 
Encoder-Decoder Based Attractors for End-to-End Neural Diarization.

ICASSP2022 Siddhant Arora, Siddharth Dalmia, Pavel Denisov, Xuankai Chang, Yushi Ueda, Yifan Peng, Yuekai Zhang, Sujay Kumar, Karthik Ganesan 0003, Brian Yan, Ngoc Thang Vu, Alan W. Black, Shinji Watanabe 0001
ESPnet-SLU: Advancing Spoken Language Understanding Through ESPnet.

ICASSP2022 Xuankai Chang, Niko Moritz, Takaaki Hori, Shinji Watanabe 0001, Jonathan Le Roux, 
Extended Graph Temporal Classification for Multi-Speaker End-to-End ASR.

ICASSP2022 Hang Chen, Hengshun Zhou, Jun Du, Chin-Hui Lee, Jingdong Chen, Shinji Watanabe 0001, Sabato Marco Siniscalchi, Odette Scharenborg, Diyuan Liu, Bao-Cai Yin, Jia Pan, Jianqing Gao, Cong Liu 0006, 
The First Multimodal Information Based Speech Processing (Misp) Challenge: Data, Tasks, Baselines And Results.

ICASSP2022 Keqi Deng, Zehui Yang, Shinji Watanabe 0001, Yosuke Higuchi, Gaofeng Cheng, Pengyuan Zhang, 
Improving Non-Autoregressive End-to-End Speech Recognition with Pre-Trained Acoustic and Language Models.

ICASSP2022 Zili Huang, Shinji Watanabe 0001, Shu-Wen Yang, Paola García, Sanjeev Khudanpur, 
Investigating Self-Supervised Learning for Speech Enhancement and Separation.

ICASSP2022 Wen-Chin Huang, Shu-Wen Yang, Tomoki Hayashi, Hung-Yi Lee, Shinji Watanabe 0001, Tomoki Toda, 
S3PRL-VC: Open-Source Voice Conversion Framework with Self-Supervised Speech Representations.

ICASSP2022 Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Shinji Watanabe 0001
Integrating Multiple ASR Systems into NLP Backend with Attention Fusion.

ICASSP2022 Takashi Maekaku, Xuankai Chang, Yuya Fujita, Shinji Watanabe 0001
An Exploration of Hubert with Large Number of Cluster Units and Model Assessment Using Bayesian Information Criterion.

ICASSP2022 Niko Moritz, Takaaki Hori, Shinji Watanabe 0001, Jonathan Le Roux, 
Sequence Transduction with Graph-Based Supervision.

ICASSP2022 Motoi Omachi, Yuya Fujita, Shinji Watanabe 0001, Tianzi Wang, 
Non-Autoregressive End-To-End Automatic Speech Recognition Incorporating Downstream Natural Language Processing.

ICASSP2022 Jing Pan, Tao Lei 0001, Kwangyoun Kim, Kyu J. Han, Shinji Watanabe 0001
SRU++: Pioneering Fast Recurrence with Attention for Speech Recognition.

ICASSP2022 Brian Yan, Chunlei Zhang, Meng Yu 0003, Shi-Xiong Zhang, Siddharth Dalmia, Dan Berrebbi, Chao Weng, Shinji Watanabe 0001, Dong Yu 0001, 
Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization.

ICASSP2022 Yao-Yuan Yang, Moto Hira, Zhaoheng Ni, Artyom Astafurov, Caroline Chen, Christian Puhrsch, David Pollack, Dmitriy Genzel, Donny Greenberg, Edward Z. Yang, Jason Lian, Jeff Hwang, Ji Chen, Peter Goldsborough, Sean Narenthiran, Shinji Watanabe 0001, Soumith Chintala, Vincent Quenneville-Bélair, 
Torchaudio: Building Blocks for Audio and Speech Processing.

Interspeech2022 Siddhant Arora, Siddharth Dalmia, Xuankai Chang, Brian Yan, Alan W. Black, Shinji Watanabe 0001
Two-Pass Low Latency End-to-End Spoken Language Understanding.

Interspeech2022 Dan Berrebbi, Jiatong Shi, Brian Yan, Osbel López-Francisco, Jonathan D. Amith, Shinji Watanabe 0001
Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation.

Interspeech2022 Xuankai Chang, Takashi Maekaku, Yuya Fujita, Shinji Watanabe 0001
End-to-End Integration of Speech Recognition, Speech Enhancement, and Self-Supervised Learning Representation.

Interspeech2022 Hang Chen, Jun Du, Yusheng Dai, Chin-Hui Lee, Sabato Marco Siniscalchi, Shinji Watanabe 0001, Odette Scharenborg, Jingdong Chen, Baocai Yin, Jia Pan, 
Audio-Visual Speech Recognition in MISP2021 Challenge: Dataset Release and Deep Analysis.

Interspeech2022 Keqi Deng, Shinji Watanabe 0001, Jiatong Shi, Siddhant Arora, 
Blockwise Streaming Transformer for Spoken Language Understanding and Simultaneous Speech Translation.

Interspeech2022 Shuai Guo, Jiatong Shi, Tao Qian, Shinji Watanabe 0001, Qin Jin, 
SingAug: Data Augmentation for Singing Voice Synthesis with Cycle-consistent Training Strategy.

#3  | Helen M. Meng | Google Scholar   DBLP
VenuesInterspeech: 71ICASSP: 52TASLP: 12IJCAI: 1SpeechComm: 1AAAI: 1
Years2022: 392021: 292020: 182019: 222018: 142017: 82016: 8
ISCA Sectionspeech synthesis: 14voice conversion and adaptation: 5speech recognition of atypical speech: 4topics in asr: 2spoken term detection: 2asr neural network architectures: 2neural techniques for voice conversion and waveform generation: 2medical applications and visual asr: 2voice conversion: 2single-channel speech enhancement: 1multi-, cross-lingual and other topics in asr: 1novel models and training methods for asr: 1atypical speech analysis and detection: 1multimodal speech emotion recognition and paralinguistics: 1miscellaneous topics in speech, voice and hearing disorders: 1speech and language in health: 1spoofing-aware automatic speaker verification (sasv): 1zero, low-resource and multi-modal speech recognition: 1embedding and network architecture for speaker recognition: 1voice anti-spoofing and countermeasure: 1non-autoregressive sequential modeling for speech processing: 1assessment of pathological speech and language: 1non-native speech: 1speaker recognition: 1speech synthesis paradigms and methods: 1speech in multimodality: 1asr neural network architectures and training: 1new trends in self-supervised speech processing: 1multimodal speech processing: 1learning techniques for speaker recognition: 1speech and speaker recognition: 1speech and audio classification: 1lexicon and language model for speech recognition: 1novel neural network architectures for acoustic modelling: 1second language acquisition and code-switching: 1emotion recognition and analysis: 1plenary talk: 1expressive speech synthesis: 1deep learning for source separation and pitch tracking: 1application of asr in medical practice: 1prosody and text processing: 1emotion modeling: 1short utterances speaker recognition: 1behavioral signal processing and speaker state and traits analytics: 1spoken documents, spoken understanding and semantic analysis: 1
IEEE Keywordspeech recognition: 35speech synthesis: 20speaker recognition: 19natural language processing: 19recurrent neural nets: 16emotion recognition: 8speech coding: 7optimisation: 7speech emotion recognition: 7bayes methods: 6deep learning (artificial intelligence): 6text analysis: 6speech separation: 5voice conversion: 5neural architecture search: 4security of data: 4gaussian processes: 4quantisation (signal): 4voice activity detection: 4entropy: 4multi channel: 4overlapped speech: 4text to speech: 3knowledge distillation: 3multi task learning: 3audio signal processing: 3speaker verification: 3adversarial attack: 3vocoders: 3speech intelligibility: 3variational inference: 3language models: 3transformer: 3gradient methods: 3convolutional neural nets: 3disordered speech recognition: 2handicapped aids: 2speaker adaptation: 2time delay neural network: 2bayesian learning: 2trees (mathematics): 2expressive speech synthesis: 2speaking style: 2audio visual systems: 2audio visual: 2dysarthric speech reconstruction: 2biometrics (access control): 2anti spoofing: 2decoding: 2speaker diarization: 2multi look: 2inference mechanisms: 2admm: 2autoregressive processes: 2word processing: 2quantization: 2code switching: 2human computer interaction: 2mispronunciation detection and diagnosis: 2multilingual: 2bidirectional long short term memory (blstm): 2cross lingual: 2elderly speech recognition: 1neural net architecture: 1search problems: 1minimisation: 1uncertainty handling: 1supervised learning: 1pattern classification: 1adversarial attacks: 1automatic speaker verification: 1adversarial defense: 1self supervised learning: 1model uncertainty: 1monte carlo methods: 1neural language models: 1computational linguistics: 1image segmentation: 1span based decoder: 1tree structure: 1character level: 1prosodic structure prediction: 1none standard word: 1rule based: 1chinese text normalization: 1flat lattice transformer: 1relative position encoding: 1knowledge based systems: 1hierarchical: 1xlnet: 1speaking style modelling: 1graph neural network: 1conversational text to speech synthesis: 1bidirectional attention mechanism: 1matrix algebra: 1hidden markov models: 1forced alignment: 1end to end model: 1speech enhancement: 1dereverberation and recognition: 1reverberation: 1unsupervised learning: 1multitask learning: 1speaker change detection: 1unsupervised speech decomposition: 1speaker identity: 1adversarial speaker adaptation: 1vocoder: 1uniform sampling: 1path dropout: 1partially fake audio detection: 1audio deep synthesis detection challenge: 1neural network quantization: 1mean square error methods: 1mixed precision: 1source separation: 1hybrid bottleneck features: 1disentangling: 1cross entropy: 1connectionist temporal classification: 1data handling: 1m2met: 1feature fusion: 1direction of arrival: 1direction of arrival estimation: 1domain adaptation: 1gaussian process: 1lf mmi: 1delays: 1generalisation (artificial intelligence): 1signal sampling: 1location relative attention: 1signal representation: 1signal reconstruction: 1sequence to sequence modeling: 1any to many: 1data augmentation: 1multimodal speech recognition: 1residual error: 1capsule: 1exemplary emotion descriptor: 1spatial information: 1recurrent: 1capsule network: 1sequential: 1low bit quantization: 1lstm rnn: 1filtering theory: 1jointly fine tuning: 1microphone arrays: 1visual occlusion: 1overlapped speech recognition: 1image recognition: 1video signal processing: 1expressive: 1emotion: 1global style token: 1synthetic speech detection: 1replay detection: 1res2net: 1multi scale feature: 1asv anti spoofing: 1alzheimer's disease detection: 1features: 1cognition: 1adress: 1medical diagnostic computing: 1geriatrics: 1asr: 1diseases: 1signal classification: 1patient diagnosis: 1linguistics: 1non autoregressive: 1ctc: 1neural network based text to speech: 1syntactic parse tree traversal: 1grammars: 1prosody control: 1syntactic representation learning: 1controllable and efficient: 1prosody modelling: 1semi autoregressive: 1multi speaker and multi style tts: 1durian: 1hifi gan: 1low resource condition: 1elderly speech: 1automatic speech recognition: 1neurocognitive disorder detection: 1dementia: 1phonetic pos teriorgrams: 1x vector: 1gmm i vector: 1accented speech recognition: 1accent conversion: 1cross modal: 1seq2seq: 1spatial smoothing: 1adversarial training: 1spoofing countermeasure: 1data compression: 1recurrent neural networks: 1alternating direction methods of multipliers: 1audio visual speech recognition: 1multi modal: 1end to end: 1multilingual speech synthesis: 1foreign accent: 1center loss: 1spectral analysis: 1discriminative features: 1activation function selection: 1gaussian process neural network: 1bayesian neural network: 1lstm: 1neural network language models: 1parameter estimation: 1connectionist temporal classification (ctc): 1con volutional neural network (cnn): 1e learning: 1mispronunciation detection and diagnosis (mdd): 1computer assisted pronunciation training (capt): 1dilated residual network: 1multi head self attention: 1self attention: 1wavenet: 1blstm: 1phonetic posteriorgrams(ppgs): 1quasifully recurrent neural network (qrnn): 1text to speech (tts) synthesis: 1parallel wavenet: 1convolutional neural network (cnn): 1parallel processing: 1capsule networks: 1spatial relationship information: 1recurrent connection: 1utterance level features: 1rnnlms: 1natural gradient: 1limited memory bfgs: 1second order optimization: 1hessian matrices: 1recurrent neural network: 1language model: 1unsupervised clustering: 1extended phoneme set in l2 speech: 1mispronunciation patterns: 1phonemic posterior grams: 1computer aided pronunciation training: 1feature representation: 1acoustic phonemic model: 1style adaptation: 1regression analysis: 1expressiveness: 1style feature: 1l2 english speech: 1mispronunciation detection: 1mispronunciation diagnosis: 1deep neural networks: 1acoustic model: 1structured output layer: 1deep bidirectional long short term memory: 1emphasis detection: 1deep bidirectional long short term memory (dblstm): 1talking avatar: 1low level descriptors (lld): 1bottleneck feature: 1bidirectional recurrent neural network (brnn): 1gated recurrent unit (gru): 1question detection: 1low resource: 1
Most Publications2022: 1142021: 702020: 362019: 332018: 28

Affiliations
The Chinese University of Hog Kong
Massachusetts Institute of Technology, Cambridge, MA, USA (former)

TASLP2022 Mengzhe Geng, Xurong Xie, Zi Ye, Tianzi Wang, Guinan Li, Shujie Hu, Xunying Liu, Helen Meng
Speaker Adaptation Using Spectro-Temporal Deep Features for Dysarthric and Elderly Speech Recognition.

TASLP2022 Shoukang Hu, Xurong Xie, Mingyu Cui, Jiajun Deng, Shansong Liu, Jianwei Yu, Mengzhe Geng, Xunying Liu, Helen Meng
Neural Architecture Search for LF-MMI Trained Time Delay Neural Networks.

TASLP2022 Haibin Wu, Xu Li, Andy T. Liu, Zhiyong Wu 0001, Helen Meng, Hung-Yi Lee, 
Improving the Adversarial Robustness for Speaker Verification by Self-Supervised Learning.

TASLP2022 Boyang Xue, Shoukang Hu, Junhao Xu, Mengzhe Geng, Xunying Liu, Helen Meng
Bayesian Neural Network Language Modeling for Speech Recognition.

ICASSP2022 Xueyuan Chen, Changhe Song, Yixuan Zhou, Zhiyong Wu 0001, Changbin Chen, Zhongqin Wu, Helen Meng
A Character-Level Span-Based Model for Mandarin Prosodic Structure Prediction.

ICASSP2022 Wenlin Dai, Changhe Song, Xiang Li 0067, Zhiyong Wu 0003, Huashan Pan, Xiulin Li, Helen Meng
An End-to-End Chinese Text Normalization Model Based on Rule-Guided Flat-Lattice Transformer.

ICASSP2022 Shun Lei, Yixuan Zhou, Liyang Chen, Zhiyong Wu 0001, Shiyin Kang, Helen Meng
Towards Expressive Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis.

ICASSP2022 Jingbei Li, Yi Meng, Chenyi Li, Zhiyong Wu 0001, Helen Meng, Chao Weng, Dan Su 0002, 
Enhancing Speaking Styles in Conversational Text-to-Speech Synthesis with Graph-Based Multi-Modal Context Modeling.

ICASSP2022 Jingbei Li, Yi Meng, Zhiyong Wu 0001, Helen Meng, Qiao Tian, Yuping Wang, Yuxuan Wang 0002, 
Neufa: Neural Network Based End-to-End Forced Alignment with Bidirectional Attention Mechanism.

ICASSP2022 Guinan Li, Jianwei Yu, Jiajun Deng, Xunying Liu, Helen Meng
Audio-Visual Multi-Channel Speech Separation, Dereverberation and Recognition.

ICASSP2022 Hang Su, Danyang Zhao, Long Dang, Minglei Li, Xixin Wu, Xunying Liu, Helen Meng
A Multitask Learning Framework for Speaker Change Detection with Content Information from Unsupervised Speech Decomposition.

ICASSP2022 Disong Wang, Songxiang Liu, Xixin Wu, Hui Lu, Lifa Sun, Xunying Liu, Helen Meng
Speaker Identity Preservation in Dysarthric Speech Reconstruction by Adversarial Speaker Adaptation.

ICASSP2022 Haibin Wu, Po-Chun Hsu, Ji Gao, Shanshan Zhang, Shen Huang, Jian Kang 0006, Zhiyong Wu 0001, Helen Meng, Hung-Yi Lee, 
Adversarial Sample Detection for Speaker Verification by Neural Vocoders.

ICASSP2022 Xixin Wu, Shoukang Hu, Zhiyong Wu 0001, Xunying Liu, Helen Meng
Neural Architecture Search for Speech Emotion Recognition.

ICASSP2022 Haibin Wu, Heng-Cheng Kuo, Naijun Zheng, Kuo-Hsuan Hung, Hung-Yi Lee, Yu Tsao 0001, Hsin-Min Wang, Helen Meng
Partially Fake Audio Detection by Self-Attention-Based Fake Span Discovery.

ICASSP2022 Junhao Xu, Jianwei Yu, Xunying Liu, Helen Meng
Mixed Precision DNN Quantization for Overlapped Speech Separation and Recognition.

ICASSP2022 Xintao Zhao, Feng Liu, Changhe Song, Zhiyong Wu 0001, Shiyin Kang, Deyi Tuo, Helen Meng
Disentangling Content and Fine-Grained Prosody Information Via Hybrid ASR Bottleneck Features for Voice Conversion.

ICASSP2022 Naijun Zheng, Na Li 0012, Xixin Wu, Lingwei Meng, Jiawen Kang, Haibin Wu, Chao Weng, Dan Su 0002, Helen Meng
The CUHK-Tencent Speaker Diarization System for the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Challenge.

ICASSP2022 Naijun Zheng, Na Li 0012, Jianwei Yu, Chao Weng, Dan Su 0002, Xunying Liu, Helen Meng
Multi-Channel Speaker Diarization Using Spatial Features for Meetings.

Interspeech2022 Jun Chen, Wei Rao, Zilin Wang, Zhiyong Wu 0001, Yannan Wang, Tao Yu, Shidong Shang, Helen Meng
Speech Enhancement with Fullband-Subband Cross-Attention Network.

#4  | Björn W. Schuller | Google Scholar   DBLP
VenuesInterspeech: 89ICASSP: 19TASLP: 6IJCAI: 2SpeechComm: 1
Years2022: 112021: 182020: 222019: 132018: 152017: 162016: 22
ISCA Sectionspecial session: 16speech in health: 4speech emotion recognition: 4speech synthesis: 3the first dicova challenge: 3the interspeech 2020 computational paralinguistics challenge (compare): 3speaker states and traits: 3voice conversion and adaptation: 2multimodal systems: 2the interspeech 2021 computational paralinguistics challenge (compare): 2computational paralinguistics: 2social signals detection and speaker traits analysis: 2attention mechanism for speaker state recognition: 2the interspeech 2018 computational paralinguistics challenge (compare): 2disorders related to speech and language: 2automatic analysis of paralinguistics: 1single-channel speech enhancement: 1atypical speech analysis and detection: 1asr technologies and systems: 1(multimodal) speech emotion recognition: 1pathological speech assessment: 1atypical speech detection: 1diverse modes of speech acquisition and processing: 1health and affect: 1speech type classification and diagnosis: 1speech in multimodality: 1alzheimer’s dementia recognition through spontaneous speech: 1diarization: 1acoustic scene classification: 1bioacoustics and articulation: 1speech enhancement: 1representation learning of emotion and paralinguistics: 1training strategy for speech emotion recognition: 1the interspeech 2019 computational paralinguistics challenge (compare): 1network architectures for emotion and paralinguistics recognition: 1speech signal characterization: 1representation learning for emotion: 1speech and language analytics for mental health: 1text analysis, multilingual issues and evaluation in speech synthesis: 1emotion modeling: 1emotion recognition and analysis: 1speech pathology, depression, and medical applications: 1speaker state and trait: 1second language acquisition and code-switching: 1social signals, styles, and interaction: 1show & tell: 1styles, varieties, forensics and tools: 1pathological speech and language: 1show & tell session: 1speech and audio segmentation and classification: 1language recognition: 1automatic assessment of emotions: 1
IEEE Keywordemotion recognition: 21speech recognition: 16speech emotion recognition: 7recurrent neural nets: 7attention mechanism: 3audio signal processing: 2pattern classification: 2signal classification: 2human computer interaction: 2psychology: 2lstm: 2multi task learning: 2semi supervised learning: 2end to end learning: 2multi source domain adaptation: 1adversarial learning: 1unsupervised domain adaptation: 1speaker independent: 1healthcare: 1hearing: 1intelligent medicine: 1medical computing: 1computer vision: 1computer audition: 1health care: 1digital phenotype: 1overview: 1domain adaptation: 1speech coding: 1relativistic discriminator: 1decoding: 1speech intelligibility: 1speech enhancement: 1deep neural network: 1maximum mean discrepancy: 1signal representation: 1disentangled representation learning: 1guided representation learning: 1audio generation: 1and generative adversarial neural network: 1iterative methods: 1parkinson's disease: 1glottal source estimation: 1diseases: 1glottal features: 1filtering theory: 1multilayer perceptrons: 1support vector machines: 1end to end systems: 1electroencephalography: 1medical signal processing: 1eeg signals: 1temporal convolutional networks: 1hierarchical attention mechanism: 1arelu: 1relu: 1gated recurrent unit: 1representation learning: 1transfer learning: 1computational linguistics: 1audiotextual information: 1paralinguistic: 1deep learning (artificial intelligence): 1semantic: 1vggish: 1consistent rank logits: 1ordinal classification: 1customer services: 1entropy: 1data protection: 1adversarial attacks: 1convolutional neural network: 1adversarial training: 1convolutional neural nets: 1gradient methods: 1emotional speech synthesis: 1adversarial networks: 1data augmentation: 1unsupervised learning: 1end to end affective computing: 1depression: 1mean square error methods: 1attention transfer: 1hierarchical attention: 1monotonic attention: 1behavioural sciences computing: 1frame level features: 1speech emotion: 1end to end: 1speech emotion prediction: 1audio visual systems: 1face recognition: 1emotion classification: 1joint training: 1audiovisual learning: 1emotion regression: 1autoencoders: 1image processing: 1conditional adversarial training: 1generative adversarial network: 1feedforward neural nets: 1affective computing: 1data aggregation: 1deep neural networks: 1market research: 1consumer behaviour: 1speech: 1opensmile: 1marketing research: 1arousal: 1signal reconstruction: 1reconstruction error: 1continuous emotion recognition: 1bidirectional long short term memory: 1performance evaluation: 1raw waveform: 1cnn: 1enhanced semi supervised learning: 1multimodal emotion recognition: 1missing labels: 1holistic speech analysis: 1data enrichment: 1speaker recognition: 1multi target learning: 1
Most Publications2022: 1262021: 1142017: 902020: 842019: 77

Affiliations
Imperial College London, GLAM, UK
University of Augsburg, Department of Computer Science, Germany
University of Passau, Faculty of Computer Science and Mathematics, Germany (former)

TASLP2022 Cheng Lu 0005, Yuan Zong, Wenming Zheng, Yang Li 0019, Chuangao Tang, Björn W. Schuller
Domain Invariant Feature Learning for Speaker-Independent Speech Emotion Recognition.

ICASSP2022 Kun Qian 0003, Tanja Schultz, Björn W. Schuller
An Overview of the FIRST ICASSP Special Session on Computer Audition for Healthcare.

Interspeech2022 Zijiang Yang 0007, Xin Jing, Andreas Triantafyllopoulos, Meishu Song, Ilhan Aslan, Björn W. Schuller
An Overview & Analysis of Sequence-to-Sequence Emotional Voice Conversion.

Interspeech2022 Rui Liu 0008, Berrak Sisman, Björn W. Schuller, Guanglai Gao, Haizhou Li 0001, 
Accurate Emotion Strength Assessment for Seen and Unseen Speech Based on Data-Driven Deep Learning.

Interspeech2022 Yi Chang, Zhao Ren, Thanh Tam Nguyen, Wolfgang Nejdl, Björn W. Schuller
Example-based Explanations with Adversarial Attacks for Respiratory Sound Analysis.

Interspeech2022 Jiaming Cheng, Ruiyu Liang, Yue Xie, Li Zhao 0003, Björn W. Schuller, Jie Jia, Yiyuan Peng, 
Cross-Layer Similarity Knowledge Distillation for Speech Enhancement.

Interspeech2022 Adria Mallol-Ragolta, Helena Cuesta, Emilia Gómez, Björn W. Schuller
Multi-Type Outer Product-Based Fusion of Respiratory Sounds for Detecting COVID-19.

Interspeech2022 Rodrigo Schoburg Carrillo de Mira, Alexandros Haliassos, Stavros Petridis, Björn W. Schuller, Maja Pantic, 
SVTS: Scalable Video-to-Speech Synthesis.

Interspeech2022 Andreas Triantafyllopoulos, Johannes Wagner 0001, Hagen Wierstorf, Maximilian Schmitt, Uwe Reichel, Florian Eyben, Felix Burkhardt, Björn W. Schuller
Probing speech emotion recognition transformers for linguistic knowledge.

Interspeech2022 Andreas Triantafyllopoulos, Markus Fendler, Anton Batliner, Maurice Gerczuk, Shahin Amiriparian, Thomas M. Berghaus, Björn W. Schuller
Distinguishing between pre- and post-treatment in the speech of patients with chronic obstructive pulmonary disease.

Interspeech2022 Dominika Woszczyk, Anna Hlédiková, Alican Akman, Soteris Demetriou, Björn W. Schuller
Data Augmentation for Dementia Detection in Spoken Language.

TASLP2021 Jiaming Cheng, Ruiyu Liang, Zhenlin Liang, Li Zhao 0003, Chengwei Huang, Björn W. Schuller
A Deep Adaptation Network for Speech Enhancement: Combining a Relativistic Discriminator With Multi-Kernel Maximum Mean Discrepancy.

TASLP2021 Kazi Nazmul Haque, Rajib Rana, Jiajun Liu, John H. L. Hansen, Nicholas Cummins, Carlos Busso, Björn W. Schuller
Guided Generative Adversarial Neural Network for Representation Learning and Audio Generation Using Fewer Labelled Audio Data.

TASLP2021 N. P. Narendra, Björn W. Schuller, Paavo Alku, 
The Detection of Parkinson's Disease From Speech Using Voice Source Information.

ICASSP2021 Chao Li, Boyang Chen, Ziping Zhao 0001, Nicholas Cummins, Björn W. Schuller
Hierarchical Attention-Based Temporal Convolutional Networks for Eeg-Based Emotion Recognition.

ICASSP2021 Srividya Tirunellai Rajamani, Kumar T. Rajamani, Adria Mallol-Ragolta, Shuo Liu, Björn W. Schuller
A Novel Attention-Based Gated Recurrent Unit and its Efficacy in Speech Emotion Recognition.

ICASSP2021 Andreas Triantafyllopoulos, Björn W. Schuller
The Role of Task and Acoustic Similarity in Audio Transfer Learning: Insights from the Speech Emotion Recognition Case.

ICASSP2021 Panagiotis Tzirakis, Anh Nguyen 0003, Stefanos Zafeiriou, Björn W. Schuller
Speech Emotion Recognition Using Semantic Information.

Interspeech2021 Pingchuan Ma 0001, Rodrigo Mira, Stavros Petridis, Björn W. Schuller, Maja Pantic, 
LiRA: Learning Visual Speech Representations from Audio Through Self-Supervision.

Interspeech2021 Alice Baird, Silvan Mertes, Manuel Milling, Lukas Stappen, Thomas Wiest, Elisabeth André, Björn W. Schuller
A Prototypical Network Approach for Evaluating Generated Emotional Speech.

#5  | Shri Narayanan | Google Scholar   DBLP
VenuesInterspeech: 69ICASSP: 23TASLP: 2ACL: 1
Years2022: 42021: 52020: 142019: 102018: 142017: 192016: 29
ISCA Sectionspecial session: 5behavioral signal processing and speaker state and traits analytics: 5articulatory measurements and analysis: 3trustworthy speech processing: 2speaker recognition and diarization: 2speech and language analytics for mental health: 2speaker state and trait: 2speech pathology, depression, and medical applications: 2speech production and physiology: 2speaker states and traits: 2disorders related to speech and language: 2speech production and perception: 2speech analysis: 2acoustic and articulatory phonetics: 2assessment of pathological speech and language: 1emotion and sentiment analysis: 1phonetics: 1speech enhancement, bandwidth extension and hearing aids: 1the interspeech 2020 far field speaker verification challenge: 1evaluation of speech technology systems and methods for resource construction and annotation: 1speech in health: 1speech signal characterization: 1the voices from a distance challenge: 1emotion and personality in conversation: 1the second dihard speech diarization challenge (dihard ii): 1topics in speech and audio signal processing: 1integrating speech science and technology for clinical applications: 1speaker diarization: 1emotion recognition and analysis: 1spoken corpora and annotation: 1novel approaches to enhancement: 1multimodal and articulatory synthesis: 1multimodal paralinguistics: 1language models for asr: 1stance, credibility, and deception: 1noise robust and far-field asr: 1models of speech production: 1speaker characterization and recognition: 1speech production analysis and modeling: 1automatic assessment of emotions: 1first and second language acquisition: 1dialogue systems and analysis of dialogue: 1speaker diarization and recognition: 1speech enhancement: 1resources and annotation of resources: 1speech synthesis: 1language recognition: 1co-inference of production and acoustics: 1
IEEE Keywordspeech recognition: 8speaker recognition: 6emotion recognition: 4pattern clustering: 4pattern classification: 4adversarial training: 3speech: 3medical disorders: 3machine learning: 2data privacy: 2medical signal processing: 2diseases: 2signal classification: 2speaker diarization: 2prosody: 2behavioural sciences computing: 2convolutional neural nets: 2biomedical mri: 2video signal processing: 2medical image processing: 2natural language processing: 2affective computing: 2speech emotion recognition: 2entropy: 2audio signal processing: 2matrix decomposition: 2speech enhancement: 2dictionary learning: 2semi supervised learning: 2fairness: 1noise enjection: 1speech emotion: 1statistical privacy: 1statistics: 1recurrent neural nets: 1hospitals: 1ubiquitous computing: 1circadian rhythms: 1health care: 1personnel: 1multi scale: 1uniform segmentation: 1score fusion: 1behavioral signal processing: 1deception detection: 1child forensic interview: 1law administration: 1asr: 1behavior: 1couples conversations: 1military computing: 1psychology: 1suicidal risk: 1wearable computers: 1time series: 1sensor fusion: 1routine analysis: 1support vector machines: 1wearable: 1machine learning.: 1data clustering: 1convlstm: 1segmentation: 1cnn: 1rtmri: 1child speech: 1paediatrics: 1domain adversarial learning: 1gradient reversal: 1autism spectrum disorder: 1patient diagnosis: 1medical diagnostic computing: 1signal representation: 1affective representation: 1speaker invariant: 1deep latent space clustering: 1medical computing: 1x vector: 1speaker embeddings: 1clustergan: 1optimisation: 1emergency management: 1multitask learning: 1text classification: 1document handling: 1situation awareness: 1mouse ultrasonic vocalizations: 1data mining: 1clustering: 1sparse subspace clustering: 1filtering theory: 1biocommunications: 1subspace similarity: 1speaker role recognition: 1language model: 1lattice rescoring: 1speech activity detection: 1entertainment: 1movie audio: 1convo lutional neural networks: 1electroencephalography: 1delta: 1multitaper: 1syllable: 1bioelectric potentials: 1eeg: 1brain computer interfaces: 1complex nmf: 1real time mri: 1signal denoising: 1noise suppression: 1semi supervised learning: 1sentiment classification: 1transfer learning: 1text analysis: 1sub band shaking: 1self organising feature maps: 1shake shake regularization: 1language identification: 1speaker clustering: 1i vectors: 1physiology: 1electrodermal activity: 1audio: 1dynamical systems model: 1estimation theory: 1vocal tract area function: 1volumetric mri: 1signal to noise ratio estimation: 1regression analysis: 1speech signal processing: 1deep neural networks: 1pathological speech disorders: 1speech synthesis: 1acoustic convolution: 1acoustic noise: 1robust speech recognition: 1acoustic features: 1non negative matrix factorization: 1probability: 1unsupervised learning: 1multimodal emotion recognition: 1latent topic modeling: 1
Most Publications2013: 632020: 612019: 582016: 552011: 53


ICASSP2022 Tiantian Feng, Hanieh Hashemi, Murali Annavaram, Shrikanth S. Narayanan
Enhancing Privacy Through Domain Adaptive Noise Injection For Speech Emotion Recognition.

Interspeech2022 Tiantian Feng, Shrikanth Narayanan
Semi-FedSER: Semi-supervised Learning for Speech Emotion Recognition On Federated Learning using Multiview Pseudo-Labeling.

Interspeech2022 Tiantian Feng, Raghuveer Peri, Shrikanth Narayanan
User-Level Differential Privacy against Attribute Inference Attack of Speech Emotion Recognition on Federated Learning.

Interspeech2022 Nikolaos Flemotomos, Shrikanth Narayanan
Multimodal Clustering with Role Induced Constraints for Speaker Diarization.

ICASSP2021 Amr Gaballah, Abhishek Tiwari, Shrikanth Narayanan, Tiago H. Falk, 
Context-Aware Speech Stress Detection in Hospital Workers Using Bi-LSTM Classifiers.

ICASSP2021 Tae Jin Park, Manoj Kumar 0007, Shrikanth Narayanan
Multi-Scale Speaker Diarization with Neural Affinity Score Fusion.

Interspeech2021 Young-Kyung Kim, Rimita Lahiri, Md. Nasir, So Hyun Kim, Somer Bishop, Catherine Lord, Shrikanth S. Narayanan
Analyzing Short Term Dynamic Speech Features for Understanding Behavioral Traits of Children with Autism Spectrum Disorder.

Interspeech2021 Haoqi Li, Yelin Kim, Cheng-Hao Kuo, Shrikanth S. Narayanan
Acted vs. Improvised: Domain Adaptation for Elicitation Approaches in Audio-Visual Emotion Recognition.

Interspeech2021 Miran Oh, Dani Byrd, Shrikanth S. Narayanan
Leveraging Real-Time MRI for Illuminating Linguistic Velum Action.

ICASSP2020 Victor Ardulov, Zane Durante, Shanna Williams, Thomas D. Lyon, Shrikanth Narayanan
Identifying Truthful Language in Child Interviews.

ICASSP2020 Sandeep Nallan Chakravarthula, Md. Nasir, Shao-Yen Tseng, Haoqi Li, Tae Jin Park, Brian R. Baucom, Craig J. Bryan, Shrikanth Narayanan, Panayiotis G. Georgiou, 
Automatic Prediction of Suicidal Risk in Military Couples Using Multimodal Interaction Cues from Couples Conversations.

ICASSP2020 Tiantian Feng, Shrikanth S. Narayanan
Modeling Behavioral Consistency in Large-Scale Wearable Recordings of Human Bio-Behavioral Signals.

ICASSP2020 S. Ashwin Hebbar, Rahul Sharma, Krishna Somandepalli, Asterios Toutios, Shrikanth Narayanan
Vocal Tract Articulatory Contour Detection in Real-Time Magnetic Resonance Images Using Spatio-Temporal Context.

ICASSP2020 Rimita Lahiri, Manoj Kumar 0007, Somer Bishop, Shrikanth Narayanan
Learning Domain Invariant Representations for Child-Adult Classification from Speech.

ICASSP2020 Haoqi Li, Ming Tu, Jing Huang 0019, Shrikanth Narayanan, Panayiotis G. Georgiou, 
Speaker-Invariant Affective Representation Learning via Adversarial Training.

ICASSP2020 Monisankha Pal, Manoj Kumar 0007, Raghuveer Peri, Tae Jin Park, So Hyun Kim, Catherine Lord, Somer Bishop, Shrikanth Narayanan
Speaker Diarization Using Latent Space Clustering in Generative Adversarial Network.

ICASSP2020 Karan Singla, Shrikanth Narayanan
Multitask Learning for Darpa Lorelei's Situation Frame Extraction Task.

ICASSP2020 Jiaxi Wang, Karel Mundnich, Allison T. Knoll, Pat Levitt, Shrikanth Narayanan
Bringing in the Outliers: A Sparse Subspace Clustering Approach to Learn a Dictionary of Mouse Ultrasonic Vocalizations.

Interspeech2020 Pavlos Papadopoulos, Shrikanth Narayanan
Exploiting Conic Affinity Measures to Design Speech Enhancement Systems Operating in Unseen Noise Conditions.

Interspeech2020 Xiaoyi Qin, Ming Li 0026, Hui Bu, Wei Rao, Rohan Kumar Das, Shrikanth Narayanan, Haizhou Li 0001, 
The INTERSPEECH 2020 Far-Field Speaker Verification Challenge.

#6  | DeLiang Wang | Google Scholar   DBLP
VenuesICASSP: 36TASLP: 27Interspeech: 27SpeechComm: 1
Years2022: 192021: 72020: 152019: 152018: 142017: 92016: 12
ISCA Sectiondeep enhancement: 3speech coding and privacy: 2single-channel speech enhancement: 2speech enhancement: 2asr for noisy and far-field speech: 2spatial and phase cues for source separation and speech recognition: 2dereverberation, noise reduction, and speaker extraction: 1challenges and opportunities for signal processing and machine learning for multiple smart devices: 1speech representation: 1multi-channel speech enhancement and hearing aids: 1source separation, dereverberation and echo cancellation: 1speech and audio quality assessment: 1noise reduction and intelligibility: 1speaker and language recognition: 1novel approaches to enhancement: 1source separation from monaural input: 1deep learning for source separation and pitch tracking: 1speech-enhancement: 1music, audio, and source separation: 1source separation and spatial audio: 1
IEEE Keywordspeech enhancement: 31speaker recognition: 15recurrent neural nets: 13speech intelligibility: 13deep neural networks: 13speech recognition: 12source separation: 11reverberation: 11speech separation: 9complex spectral mapping: 8convolutional neural nets: 8fourier transforms: 7microphone arrays: 6microphones: 5permutation invariant training: 5time domain: 5array signal processing: 5time frequency analysis: 5ideal ratio mask: 5monaural speech enhancement: 4audio signal processing: 4deep casa: 4speech dereverberation: 4dereverberation: 4time frequency masking: 4robust asr: 4self attention: 3blind source separation: 3acoustic noise: 3signal denoising: 3microphone array processing: 3direction of arrival estimation: 3optimisation: 3long short term memory: 3recurrent neural networks: 3ensemble learning: 3hearing: 3computational auditory scene analysis: 3speaker separation: 3monaural speech separation: 3beamforming: 3complex ideal ratio mask: 3speech quality: 3deep neural network: 3recurrent neural network: 2time domain enhancement: 2cross corpus generalization: 2deep learning (artificial intelligence): 2multi channel speaker separation: 2location based training: 2complex domain: 2bone conduction: 2attention based fusion: 2neural cascade architecture: 2robust automatic speech recognition: 2natural language processing: 2speaker diarization: 2robust speaker localization: 2talker independent speaker separation: 2robust speaker recognition: 2covariance matrices: 2phase estimation: 2transient response: 2decoding: 2encoding: 2gated linear units: 2residual learning: 2feedforward neural nets: 2dilated convolutions: 2deep clustering: 2iterative methods: 2phase: 2spectral analysis: 2spectral mapping: 2talker independence: 1neural net architecture: 1signal representation: 1cascade architecture: 1air conduction: 1sensor fusion: 1neurocontrollers: 1multi channel aec: 1acoustic echo cancellation: 1nonlinear distortions: 1echo suppression: 1fixed array: 1mimo: 1multichannel: 1triple path: 1frequency domain analysis: 1self supervised learning: 1continuous speech separation: 1spectral magnitude: 1cross domain speech enhancement: 1multi speaker asr: 1alimeeting: 1m2met: 1meeting transcription: 1acoustic echo suppression: 1frame level snr estimation: 1feature combination: 1quantization: 1data compression: 1pruning: 1quantisation (signal): 1model compression: 1sparse regularization: 1on device processing: 1mobile communication: 1dual microphone mobile phones: 1real time speech enhancement: 1densely connected convolutional recurrent network: 1continuous speaker separation: 1convolutional neural network: 1complex domain separation: 1self attention mechanism: 1music: 1singing voice separation: 1monaural speaker separation: 1causal processing: 1signal to noise ratio: 1channel generalization: 1robust enhancement: 1gaussian processes: 1masking based beamforming: 1gammatone frequency cepstral coefficient (gfcc): 1x vector: 1gated convolutional recurrent network: 1speech distortion: 1distortion independent acoustic modeling: 1temporal convolutional networks: 1room impulse response: 1speaker and noise independent: 1fully convolutional: 1time frequency loss: 1time domain analysis: 1dense network: 1robustness: 1processing artifacts: 1voice telecommunication: 1two stage network: 1cochannel speech separation: 1audio databases: 1divide and conquer methods: 1pattern clustering: 1fully convolutional neural network: 1mean absolute error: 1sequence to sequence mapping: 1generalisation (artificial intelligence): 1chimera++ networks: 1spatial features: 1steered response power: 1gcc phat: 1signal reconstruction: 1denoising: 1noise independent and speaker independent speech enhancement: 1real time implementation: 1tcnn: 1temporal convolutional neural network: 1phase aware speech enhancement: 1cdnn: 1learning phase: 1complex valued deep neural networks: 1convolutional recurrent network: 1causal system: 1chimera + + networks: 1phase reconstruction: 1co channel speech separation: 1hidden markov models: 1multi pitch tracking: 1estimation theory: 1l1 loss: 1fully connected: 1generative adversarial networks: 1phase sensitive mask: 1transfer functions: 1chime 4: 1frequency estimation: 1relative transfer function estimation: 1wiener filters: 1eigenvalues and eigenfunctions: 1supervised speech enhancement: 1binaural speech separation: 1deep neural network (dnn): 1computational auditory scene analysis (casa): 1room reverberation: 1i vector: 1probability: 1time and frequency modeling: 1pitch detection: 1deep stacking networks: 1speech coding: 1unsupervised speaker adaptation: 1batch normalization: 1chime 3: 1chime 2: 1backpropagation: 1joint training: 1deep neural networks (dnn): 1unsupervised learning: 1cochleagram: 1prediction theory: 1acoustic signal detection: 1voice activity detection: 1noise independent training: 1multi resolution stacking: 1signal resolution: 1masking based separation: 1multicontext networks: 1mapping based separation: 1speech synthesis: 1pitch estimation: 1hidden markov model: 1speaker dependent modeling: 1ideal binary mask: 1cnn: 1dnn: 1signal approximation: 1speech denoising: 1speech intelligibility test: 1anechoic chambers (acoustic): 1
Most Publications2022: 272020: 272021: 252018: 232019: 19


TASLP2022 Ashutosh Pandey 0004, DeLiang Wang
Self-Attending RNN for Speech Enhancement to Improve Cross-Corpus Generalization.

TASLP2022 Hassan Taherian, Ke Tan 0001, DeLiang Wang
Multi-Channel Talker-Independent Speaker Separation Through Location-Based Training.

TASLP2022 Heming Wang, DeLiang Wang
Neural Cascade Architecture With Triple-Domain Loss for Speech Enhancement.

TASLP2022 Heming Wang, Xueliang Zhang 0001, DeLiang Wang
Fusing Bone-Conduction and Air-Conduction Sensors for Complex-Domain Speech Enhancement.

TASLP2022 Hao Zhang, DeLiang Wang
Neural Cascade Architecture for Multi-Channel Acoustic Echo Suppression.

ICASSP2022 Ashutosh Pandey 0004, Buye Xu, Anurag Kumar 0003, Jacob Donley, Paul Calamia, DeLiang Wang
TPARN: Triple-Path Attentive Recurrent Network for Time-Domain Multichannel Speech Enhancement.

ICASSP2022 Hassan Taherian, Ke Tan 0001, DeLiang Wang
Location-Based Training for Multi-Channel Talker-Independent Speaker Separation.

ICASSP2022 Heming Wang, Yao Qian, Xiaofei Wang 0009, Yiming Wang, Chengyi Wang 0002, Shujie Liu 0001, Takuya Yoshioka, Jinyu Li 0001, DeLiang Wang
Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction.

ICASSP2022 Zhong-Qiu Wang, DeLiang Wang
Localization based Sequential Grouping for Continuous Speech Separation.

ICASSP2022 Heming Wang, DeLiang Wang
Cross-Domain Speech Enhancement with a Neural Cascade Architecture.

ICASSP2022 Heming Wang, Xueliang Zhang 0001, DeLiang Wang
Attention-Based Fusion for Bone-Conducted and Air-Conducted Speech Enhancement in the Complex Domain.

ICASSP2022 Fan Yu, Shiliang Zhang, Pengcheng Guo, Yihui Fu, Zhihao Du, Siqi Zheng, Weilong Huang, Lei Xie 0001, Zheng-Hua Tan, DeLiang Wang, Yanmin Qian, Kong Aik Lee, Zhijie Yan, Bin Ma 0001, Xin Xu, Hui Bu, 
Summary on the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge.

ICASSP2022 Hao Zhang, DeLiang Wang
Neural Cascade Architecture for Joint Acoustic Echo and Noise Suppression.

Interspeech2022 Ashutosh Pandey 0004, DeLiang Wang
Attentive Training: A New Training Framework for Talker-independent Speaker Extraction.

Interspeech2022 Ashutosh Pandey 0004, Buye Xu, Anurag Kumar 0003, Jacob Donley, Paul Calamia, DeLiang Wang
Time-domain Ad-hoc Array Speech Enhancement Using a Triple-path Network.

Interspeech2022 Haohe Liu, Woosung Choi, Xubo Liu, Qiuqiang Kong, Qiao Tian, DeLiang Wang
Neural Vocoder is All You Need for Speech Super-resolution.

Interspeech2022 Haohe Liu, Xubo Liu, Qiuqiang Kong, Qiao Tian, Yan Zhao 0010, DeLiang Wang, Chuanzeng Huang, Yuxuan Wang 0002, 
VoiceFixer: A Unified Framework for High-Fidelity Speech Restoration.

Interspeech2022 Hao Zhang, Ashutosh Pandey 0004, DeLiang Wang
Attentive Recurrent Network for Low-Latency Active Noise Control.

Interspeech2022 Yixuan Zhang, Heming Wang, DeLiang Wang
Densely-connected Convolutional Recurrent Network for Fundamental Frequency Estimation in Noisy Speech.

TASLP2021 Hao Li 0046, DeLiang Wang, Xueliang Zhang 0001, Guanglai Gao, 
Recurrent Neural Networks and Acoustic Features for Frame-Level Signal-to-Noise Ratio Estimation.

#7  | Dong Yu 0001 | Google Scholar   DBLP
VenuesICASSP: 39Interspeech: 37TASLP: 7ICLR: 2NAACL: 2IJCAI: 1SpeechComm: 1
Years2022: 142021: 182020: 202019: 152018: 102017: 52016: 7
ISCA Sectionspeech synthesis: 2voice conversion and adaptation: 2speaker recognition: 2source separation, dereverberation and echo cancellation: 2multi-channel speech enhancement: 2singing voice computing and processing in music: 2deep learning for source separation and pitch tracking: 2sequence models for asr: 2dereverberation and echo cancellation: 1multi-, cross-lingual and other topics in asr: 1topics in asr: 1source separation: 1novel neural network architectures for asr: 1speech localization, enhancement, and quality assessment: 1asr model training and strategies: 1speech synthesis paradigms and methods: 1multimodal speech processing: 1speech and audio source separation and scene analysis: 1speech enhancement: 1asr neural network architectures: 1asr neural network training: 1asr for noisy and far-field speech: 1robust speech recognition: 1speaker verification using neural network methods: 1expressive speech synthesis: 1topics in speech recognition: 1search, computational strategies and language modeling: 1noise robust speech recognition: 1neural networks in speech recognition: 1
IEEE Keywordspeech recognition: 28speaker recognition: 12recurrent neural nets: 9speech synthesis: 8speech separation: 7speech enhancement: 6natural language processing: 5source separation: 4end to end speech recognition: 4data augmentation: 4voice activity detection: 3unsupervised learning: 3microphone arrays: 3speaker embedding: 3feedforward neural nets: 3multi task learning: 3deep neural network: 3voice conversion: 2application program interfaces: 2graphics processing units: 2reverberation: 2pattern clustering: 2audio visual systems: 2audio signal processing: 2text analysis: 2filtering theory: 2neural net architecture: 2semi supervised learning: 2overlapped speech: 2domain adaptation: 2transfer learning: 2maximum mean discrepancy: 2speech coding: 2code switching: 2self attention: 2automatic speech recognition: 2attention based model: 2knowledge distillation: 2dnn: 2permutation invariant training: 2resnet: 2vgg: 2recurrent neural networks: 2blstm: 2conversational speech recognition: 2lace: 2convolutional neural networks: 2microphones: 2factor representation: 2far field speech recognition: 2supervised learning: 1self supervised disentangled representation learning: 1zero shot style transfer: 1variational autoencoder: 1low quality data: 1neural speech synthesis: 1style transfer: 1dual path: 1acoustic model: 1dynamic weight attention: 1echo suppression: 1joint training: 1streaming: 1acoustic environment: 1speech simulation: 1transient response: 1computational linguistics: 1code switched asr: 1bilingual asr: 1rnn t: 1router architecture: 1accent embedding: 1global information: 1domain embedding: 1expert systems: 1mixture of experts: 1speaker diarization: 1overlap speech detection: 1inference mechanisms: 1speaker clustering: 1audio visual processing: 1sensor fusion: 1sound source separation: 1conversational semantic role labeling: 1rewriting systems: 1natural language understanding: 1semantic role labeling: 1dialogue understanding: 1interactive systems: 1multi channel: 1audio visual: 1jointly fine tuning: 1visual occlusion: 1overlapped speech recognition: 1image recognition: 1video signal processing: 1adl mvdr: 1array signal processing: 1mvdr: 1transferable architecture: 1neural architecture search: 1single channel: 1multi granularity: 1self attentive network: 1synthetic speech detection: 1replay detection: 1res2net: 1multi scale feature: 1asv anti spoofing: 1uncertainty estimation: 1targetspeaker speech extraction: 1target speaker speech recognition: 1source localization: 1direction of arrival estimation: 1contrastive learning: 1self supervised learning: 1interference suppression: 1target speaker enhancement: 1robust speaker verification: 1speaker verification (sv): 1speech intelligibility: 1phonetic pos teriorgrams: 1regression analysis: 1singing synthesis: 1multi channel speech separation: 1spatial features: 1end to end: 1spatial filters: 1inter channel convolution differences: 1parallel optimization: 1bmuf: 1lstm language model: 1random sampling.: 1model partition: 1teacher student: 1speaker verification: 1accented speech recognition: 1accent conversion: 1target speech extraction: 1signal reconstruction: 1minimisation: 1neural beamformer: 1persistent memory: 1dfsmn: 1audio visual speech recognition: 1multi modal: 1language model: 1asr: 1acoustic variability: 1sequence discriminative training: 1hidden markov models: 1discriminative feature learning: 1quasifully recurrent neural network (qrnn): 1convolutional neural nets: 1variational inference: 1text to speech (tts) synthesis: 1parallel wavenet: 1convolutional neural network (cnn): 1parallel processing: 1relative position aware representation: 1sequence to sequence model: 1text to speech synthesis: 1all rounder: 1teacher student training: 1multi domain: 1decoding: 1privacy preserving: 1cloud computing: 1quantization: 1encryption: 1polynomials: 1cryptography: 1optimisation: 1siamese neural networks: 1end to end speaker verification: 1seq2seq attention: 1text dependent: 1multi talker speech recognition: 1unsupervised training: 1spatial smoothing: 1smoothing methods: 1cocktail party problem: 1cnn: 1factor aware training: 1robust speech recognition: 1parallel data: 1feature denoising: 1signal classification: 1integrated adaptation: 1signal representation: 1i vector: 1lstm rnns: 1speaker adaptation: 1speaking rate: 1speaker aware training: 1highway lstm: 1lstm: 1cntk: 1sequence training: 1
Most Publications2022: 772020: 762021: 682019: 662023: 28

Affiliations
Tencent AI Lab, China
Microsoft Research, Redmond, WA, USA (1998 - 2017)
University of Idaho, Moscow, ID, USA (PhD)

ICASSP2022 Jiachen Lian, Chunlei Zhang, Dong Yu 0001
Robust Disentangled Variational Speech Representation Learning for Zero-Shot Voice Conversion.

ICASSP2022 Songxiang Liu, Shan Yang, Dan Su 0002, Dong Yu 0001
Referee: Towards Reference-Free Cross-Speaker Style Transfer with Low-Quality Data for Expressive Speech Synthesis.

ICASSP2022 Dongpeng Ma, Yiwen Wang, Liqiang He, Mingjie Jin, Dan Su 0002, Dong Yu 0001
DP-DWA: Dual-Path Dynamic Weight Attention Network With Streaming Dfsmn-San For Automatic Speech Recognition.

ICASSP2022 Anton Ratnarajah, Shi-Xiong Zhang, Meng Yu 0003, Zhenyu Tang 0001, Dinesh Manocha, Dong Yu 0001
Fast-Rir: Fast Neural Diffuse Room Impulse Response Generator.

ICASSP2022 Brian Yan, Chunlei Zhang, Meng Yu 0003, Shi-Xiong Zhang, Siddharth Dalmia, Dan Berrebbi, Chao Weng, Shinji Watanabe 0001, Dong Yu 0001
Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization.

ICASSP2022 Zhao You, Shulin Feng, Dan Su 0002, Dong Yu 0001
Speechmoe2: Mixture-of-Experts Model with Improved Routing.

ICASSP2022 Chunlei Zhang, Jiatong Shi, Chao Weng, Meng Yu 0003, Dong Yu 0001
Towards end-to-end Speaker Diarization with Generalized Neural Speaker Clustering.

Interspeech2022 Ziqian Dai, Jianwei Yu, Yan Wang, Nuo Chen, Yanyao Bian, Guangzhi Li, Deng Cai 0002, Dong Yu 0001
Automatic Prosody Annotation with Pre-Trained Text-Speech Model.

Interspeech2022 Vinay Kothapally, Yong Xu 0004, Meng Yu 0003, Shi-Xiong Zhang, Dong Yu 0001
Joint Neural AEC and Beamforming with Double-Talk Detection.

Interspeech2022 Jiachen Lian, Chunlei Zhang, Gopala Krishna Anumanchipalli, Dong Yu 0001
Towards Improved Zero-shot Voice Conversion with Conditional DSVAE.

Interspeech2022 Jinchuan Tian, Jianwei Yu, Chunlei Zhang, Yuexian Zou, Dong Yu 0001
LAE: Language-Aware Encoder for Monolingual and Multilingual ASR.

ICLR2022 Max W. Y. Lam, Jun Wang 0091, Dan Su 0002, Dong Yu 0001
BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis.

IJCAI2022 Rongjie Huang, Max W. Y. Lam, Jun Wang 0091, Dan Su 0002, Dong Yu 0001, Yi Ren 0006, Zhou Zhao, 
FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis.

NAACL2022 Dian Yu 0001, Ben Zhou, Dong Yu 0001
End-to-End Chinese Speaker Identification.

TASLP2021 Daniel Michelsanti, Zheng-Hua Tan, Shi-Xiong Zhang, Yong Xu 0004, Meng Yu 0003, Dong Yu 0001, Jesper Jensen 0001, 
An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation.

TASLP2021 Kun Xu 0005, Han Wu 0004, Linfeng Song, Haisong Zhang, Linqi Song, Dong Yu 0001
Conversational Semantic Role Labeling.

TASLP2021 Jianwei Yu, Shi-Xiong Zhang, Bo Wu, Shansong Liu, Shoukang Hu, Mengzhe Geng, Xunying Liu, Helen Meng, Dong Yu 0001
Audio-Visual Multi-Channel Integration and Recognition of Overlapped Speech.

TASLP2021 Zhuohuang Zhang, Yong Xu 0004, Meng Yu 0003, Shi-Xiong Zhang, Lianwu Chen, Donald S. Williamson, Dong Yu 0001
Multi-Channel Multi-Frame ADL-MVDR for Target Speech Separation.

ICASSP2021 Liqiang He, Dan Su 0002, Dong Yu 0001
Learned Transferable Architectures Can Surpass Hand-Designed Architectures for Large Scale Speech Recognition.

ICASSP2021 Max W. Y. Lam, Jun Wang 0091, Dan Su 0002, Dong Yu 0001
Sandglasset: A Light Multi-Granularity Self-Attentive Network for Time-Domain Speech Separation.

#8  | Yanmin Qian | Google Scholar   DBLP
VenuesInterspeech: 37ICASSP: 31TASLP: 15SpeechComm: 4
Years2022: 202021: 152020: 122019: 112018: 142017: 72016: 8
ISCA Sectionembedding and network architecture for speaker recognition: 4speaker recognition and anti-spoofing: 2noise robust and distant speech recognition: 2speaker recognition: 2deep learning for source separation and pitch tracking: 2novel models and training methods for asr: 1speaker embedding and diarization: 1speech enhancement and intelligibility: 1source separation: 1topics in asr: 1sdsv challenge 2021: 1speech synthesis: 1multimodal systems: 1speaker, language, and privacy: 1speaker recognition challenges and applications: 1learning techniques for speaker recognition: 1targeted source separation: 1multilingual and code-switched asr: 1anti-spoofing and liveness detection: 1spoken term detection, confidence measure, and end-to-end speech recognition: 1feature extraction for asr: 1the 2019 automatic speaker verification spoofing and countermeasures challenge: 1asr neural network training: 1speech and audio source separation and scene analysis: 1robust speech recognition: 1acoustic modelling: 1short utterances speaker recognition: 1search, computational strategies and language modeling: 1noise robust speech recognition: 1spoken term detection: 1
IEEE Keywordspeech recognition: 25speaker recognition: 22recurrent neural nets: 11speaker verification: 9data augmentation: 7feedforward neural nets: 6end to end: 5robust speech recognition: 5natural language processing: 4multi task learning: 4gaussian processes: 4deep neural network: 4continuous speech separation: 3unsupervised learning: 3speech separation: 3audio signal processing: 3signal classification: 3convolution: 3source separation: 3reverberation: 3data handling: 3i vector: 3permutation invariant training: 3knowledge distillation: 3spoofing detection: 3dual path modeling: 2deep learning (artificial intelligence): 2curriculum learning: 2speaker embedding: 2speech enhancement: 2audio visual: 2transforms: 2mixture models: 2speech synthesis: 2text dependent speaker verification: 2acoustic modeling: 2triplet loss: 2factor aware training: 2cluster adaptive training: 2multi talker speech recognition: 2speaker adaptation: 2convolutional neural networks: 2signal representation: 2microphones: 2factor representation: 2pattern clustering: 2far field speech recognition: 2memory pool: 1overlap ratio predictor: 1multi accent: 1accent embedding: 1end to end speech recognition: 1encoding: 1layer wise adaptation: 1optimisation: 1length perturbation: 1low resource speech recognition: 1representation learning: 1image representation: 1self supervised pretrain: 1text independent: 1multilayer perceptrons: 1multi layer perceptron: 1convolution attention: 1local attention: 1gaussian attention: 1local information: 1skipping memory: 1low latency: 1real time: 1time domain analysis: 1synchronisation: 1multi modality: 1object detection: 1low quality video: 1video signal processing: 1attention: 1multi speaker asr: 1alimeeting: 1speaker diarization: 1m2met: 1meeting transcription: 1microphone arrays: 1edge devices: 1punctuation prediction: 1streaming speech recognition: 1multi modal system: 1face recognition: 1biometrics (access control): 1audio visual deep neural network: 1data analysis: 1person verification: 1unknown kind spoofing detection: 1constant q modified octave coefficients: 1modified magnitude phase spectrum: 1signal detection: 1complex backpropagation: 1transfer functions: 1blind source separation: 1array signal processing: 1signal to distortion ratio: 1multi channel source separation: 1acoustic beamforming: 1domain adaptation: 1contrastive learning: 1self supervised learning: 1test time augmentation: 1accent identification: 1ppg: 1phone posteriorgram: 1data fusion: 1tts based data augmentation: 1x vector: 1unit selection synthesis: 1long recording speech separation: 1convolutional neural nets: 1online processing: 1accented speech recognition: 1accent recognition: 1end to end asr: 1data selection: 1children’s speech recognition: 1text to speech: 1variational auto encoder: 1text independent speaker verification: 1generative adversarial network: 1end to end model: 1multi talker mixed speech recognition: 1decoding: 1transformer: 1neural beamforming: 1overlapped speech recognition: 1channel information: 1adversarial training: 1multitask learning: 1audio visual systems: 1attention mechanism: 1multimodal: 1speaker neural embedding: 1angular softmax: 1center loss: 1short duration text independent speaker verification: 1computer aided instruction: 1teacher student learning: 1security of data: 1convolutional neural network: 1residual learning: 1unsupervised training: 1auxiliary features: 1very deep convolutional neural network: 1generative adversarial networks: 1hard trial selection: 1expectation maximisation algorithm: 1future vector: 1very deep convolution residual network: 1speech intelligibility: 1dilated convolution: 1co channel speaker identification: 1focal loss: 1speech coding: 1lattice: 1decoder: 1lvcsr: 1kws: 1wfst: 1dlss: 1lattice theory: 1ctc: 1deep features: 1noise robust: 1cldnn: 1btas2016: 1very deep cnns: 1matrix algebra: 1hidden markov models: 1interpolation: 1parallel data: 1feature denoising: 1integrated adaptation: 1lstm rnns: 1speaking rate: 1speaker aware training: 1audio segmentation: 1television broadcasting: 1multi genre broadcast data: 1error analysis: 1
Most Publications2022: 482021: 342018: 262019: 212020: 20

Affiliations
URLs

TASLP2022 Chenda Li, Zhuo Chen 0006, Yanmin Qian
Dual-Path Modeling With Memory Embedding Model for Continuous Speech Separation.

TASLP2022 Yanmin Qian, Xun Gong 0005, Houjun Huang, 
Layer-Wise Fast Adaptation for End-to-End Multi-Accent Speech Recognition.

TASLP2022 Yanmin Qian, Zhikai Zhou, 
Optimizing Data Usage for Low-Resource Speech Recognition.

ICASSP2022 Zhengyang Chen, Sanyuan Chen, Yu Wu 0012, Yao Qian, Chengyi Wang 0002, Shujie Liu 0001, Yanmin Qian, Michael Zeng 0001, 
Large-Scale Self-Supervised Speech Representation Learning for Automatic Speaker Verification.

ICASSP2022 Bing Han, Zhengyang Chen, Bei Liu, Yanmin Qian
MLP-SVNET: A Multi-Layer Perceptrons Based Network for Speaker Verification.

ICASSP2022 Bing Han, Zhengyang Chen, Yanmin Qian
Local Information Modeling with Self-Attention for Speaker Verification.

ICASSP2022 Chenda Li, Lei Yang, Weiqin Wang, Yanmin Qian
Skim: Skipping Memory Lstm for Low-Latency Real-Time Continuous Speech Separation.

ICASSP2022 Wei Wang, Xun Gong 0005, Yifei Wu, Zhikai Zhou, Chenda Li, Wangyou Zhang, Bing Han, Yanmin Qian
The Sjtu System For Multimodal Information Based Speech Processing Challenge 2021.

ICASSP2022 Yifei Wu, Chenda Li, Jinfeng Bai, Zhongqin Wu, Yanmin Qian
Time-Domain Audio-Visual Speech Separation on Low Quality Videos.

ICASSP2022 Fan Yu, Shiliang Zhang, Pengcheng Guo, Yihui Fu, Zhihao Du, Siqi Zheng, Weilong Huang, Lei Xie 0001, Zheng-Hua Tan, DeLiang Wang, Yanmin Qian, Kong Aik Lee, Zhijie Yan, Bin Ma 0001, Xin Xu, Hui Bu, 
Summary on the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge.

ICASSP2022 Zhikai Zhou, Tian Tan 0002, Yanmin Qian
Punctuation Prediction for Streaming On-Device Speech Recognition.

Interspeech2022 Xun Gong 0005, Zhikai Zhou, Yanmin Qian
Knowledge Transfer and Distillation from Autoregressive to Non-Autoregessive Speech Recognition.

Interspeech2022 Bing Han, Zhengyang Chen, Yanmin Qian
Self-Supervised Speaker Verification Using Dynamic Loss-Gate and Label Correction.

Interspeech2022 Tao Liu, Shuai Fan 0005, Xu Xiang, Hongbo Song, Shaoxiong Lin, Jiaqi Sun, Tianyuan Han, Siyuan Chen, Binwei Yao, Sen Liu, Yifei Wu, Yanmin Qian, Kai Yu 0004, 
MSDWild: Multi-modal Speaker Diarization Dataset in the Wild.

Interspeech2022 Bei Liu, Zhengyang Chen, Yanmin Qian
Attentive Feature Fusion for Robust Speaker Verification.

Interspeech2022 Bei Liu, Zhengyang Chen, Yanmin Qian
Dual Path Embedding Learning for Speaker Verification with Triplet Attention.

Interspeech2022 Bei Liu, Zhengyang Chen, Shuai Wang 0016, Haoyu Wang, Bing Han, Yanmin Qian
DF-ResNet: Boosting Speaker Verification Performance with Depth-First Design.

Interspeech2022 Yen-Ju Lu, Xuankai Chang, Chenda Li, Wangyou Zhang, Samuele Cornell, Zhaoheng Ni, Yoshiki Masuyama, Brian Yan, Robin Scheibler, Zhong-Qiu Wang, Yu Tsao 0001, Yanmin Qian, Shinji Watanabe 0001, 
ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding.

Interspeech2022 Wangyou Zhang, Zhuo Chen 0006, Naoyuki Kanda, Shujie Liu 0001, Jinyu Li 0001, Sefik Emre Eskimez, Takuya Yoshioka, Xiong Xiao, Zhong Meng, Yanmin Qian, Furu Wei, 
Separating Long-Form Speech with Group-wise Permutation Invariant Training.

Interspeech2022 Leying Zhang, Zhengyang Chen, Yanmin Qian
Enroll-Aware Attentive Statistics Pooling for Target Speaker Verification.

#9  | John H. L. Hansen | Google Scholar   DBLP
VenuesInterspeech: 52TASLP: 13ICASSP: 12SpeechComm: 9
Years2022: 132021: 92020: 92019: 112018: 142017: 172016: 13
ISCA Sectionspeaker recognition: 3applications in transcription, education and learning: 2dereverberation and echo cancellation: 2speaker recognition challenges and applications: 2integrating speech science and technology for clinical applications: 2speaker recognition evaluation: 2special session: 2spoken language processing: 1pathological speech analysis: 1resource-constrained asr: 1speech representation: 1speech enhancement and intelligibility: 1embedding and network architecture for speaker recognition: 1multi-, cross-lingual and other topics in asr: 1asr technologies and systems: 1target speaker detection, localization and separation: 1speech and audio quality assessment: 1language learning: 1the fearless steps challenge phase-02: 1speaker embedding: 1topics in speech and audio signal processing: 1speaker recognition and diarization: 1language learning and databases: 1speech perception in adverse listening conditions: 1speech enhancement: 1speaker and language recognition: 1speech and audio source separation and scene analysis: 1speaker verification: 1speaker verification using neural network methods: 1adjusting to speaker, accent, and domain: 1spoken corpora and annotation: 1speech analysis and representation: 1signal analysis for the natural, biological and social sciences: 1music and audio processing: 1speech recognition: 1source separation and voice activity detection: 1far-field speech recognition: 1speaker and language recognition applications: 1robust speaker recognition: 1dereverberation, echo cancellation and speech: 1language recognition: 1speaker diarization and recognition: 1spoken term detection: 1multimodal processing: 1
IEEE Keywordspeaker recognition: 16speech recognition: 6gaussian processes: 5convolutional neural nets: 4speaker verification: 4natural language processing: 4audio signal processing: 3calibration: 3optimisation: 3mixture models: 3unsupervised learning: 3pattern clustering: 3deep neural networks: 2speech enhancement: 2generative adversarial networks: 2neural net architecture: 2text analysis: 2signal detection: 2speech separation: 2co channel speech detection: 2overlapping speech detection: 2peer led team learning: 2domain adaptation: 2deep neural network: 2i vector: 2language identification: 2nist opensat: 2speech activity detection: 2nist opensad: 2clustering: 2hartigan dip test: 2probability: 2speech coding: 2support vector machines: 2speaker diarization: 2time frequency analysis: 1reverberation: 1multi source domain adaptation: 1discrepancy loss: 1forensics: 1domain adversarial training: 1maximum mean discrepancy: 1moment matching: 1signal representation: 1disentangled representation learning: 1guided representation learning: 1audio generation: 1and generative adversarial neural network: 1lombard effect: 1whisper/vocal effort: 1speech modeling: 1binary classifier: 11 d cnn: 1convolutional neural network: 1cocktail party problem: 1simultaneous speaker detection: 1speech synthesis: 1residual learning: 1adversarial domain adaptation: 1embedding disentangling: 1deep learning (artificial intelligence): 1siamese networks: 1computer assisted language learning: 1mispronunciation verification: 1phone embedding: 1source counting: 1convolutional neural networks: 1voice activity detection: 1mixed speech: 1sincnet: 1audio diarization: 1speaker clustering: 1transfer learning: 1embedded systems: 1speaker embedding: 1nist sre: 1pattern classification: 1semi supervised learning: 1arabic dialect identification: 1darpa rats: 1frequency dependent kernel: 1principal component analysis: 1audio streaming: 1transforms: 1speaker recognition evaluation: 1expectation maximisation algorithm: 1adversarial autoencoder: 1bottleneck feature: 1phonetic label estimation: 1language/dialect recognition: 1variational autoencoder: 1zero resource speech processing: 1iterative methods: 1opinion: 1ut sentiment audio archive: 1amazon: 1maximum entropy: 1keyword spotting: 1lvcsr: 1sentiment analysis: 1maximum entropy methods: 1signal classification: 1modulation: 1demodulation: 1frequency offset: 1single sideband (ssb): 1bottom up clustering: 1active learning: 1support vectors: 1discriminant analysis: 1i vector/plda speaker recognition: 1microphones: 1overlapped speech: 1hidden markov models: 1crosstalk: 1acoustic scene analysis: 1emotion recognition: 1denoising autoencoders: 1vector taylor series: 1human computer interaction: 1cepstral analysis: 1whispered speech recognition: 1generative models: 1signal denoising: 1ageing: 1speaker variability: 1aging: 1quality measures: 1limited resources: 1estimation theory: 1snr estimation: 1harmonics: 1temporal continuity constraints: 1f0 estimation: 1local tf segment: 1
Most Publications2010: 352016: 342014: 342017: 332015: 32


SpeechComm2022 Rasa Lileikyte, Dwight Irvin, John H. L. Hansen
Assessing child communication engagement and statistical speech patterns for American English via speech recognition in naturalistic active learning spaces.

TASLP2022 Vinay Kothapally, John H. L. Hansen
SkipConvGAN: Monaural Speech Dereverberation Using Generative Adversarial Networks via Complex Time-Frequency Masking.

TASLP2022 Zhenyu Wang, John H. L. Hansen
Multi-Source Domain Adaptation for Text-Independent Forensic Speaker Recognition.

Interspeech2022 Chelzy Belitz, John H. L. Hansen
Challenges in Metadata Creation for Massive Naturalistic Team-Based Audio Data.

Interspeech2022 Avamarie Brueggeman, John H. L. Hansen
Speaker Trait Enhancement for Cochlear Implant Users: A Case Study for Speaker Emotion Perception.

Interspeech2022 Szu-Jui Chen, Jiamin Xie, John H. L. Hansen
FeaRLESS: Feature Refinement Loss for Ensembling Self-Supervised Learning Features in Robust End-to-end Speech Recognition.

Interspeech2022 Satwik Dutta, Sarah Anne Tao, Jacob C. Reyna, Rebecca Elizabeth Hacker, Dwight W. Irvin, Jay F. Buzhardt, John H. L. Hansen
Challenges remain in Building ASR for Spontaneous Preschool Children Speech in Naturalistic Educational Environments.

Interspeech2022 John H. L. Hansen, Zhenyu Wang, 
Audio Anti-spoofing Using Simple Attention Module and Joint Optimization Based on Additive Angular Margin Loss and Meta-learning.

Interspeech2022 Vinay Kothapally, John H. L. Hansen
Complex-Valued Time-Frequency Self-Attention for Speech Dereverberation.

Interspeech2022 Juliana N. Saba, John H. L. Hansen
Speech Modification for Intelligibility in Cochlear Implant Listeners: Individual Effects of Vowel- and Consonant-Boosting.

Interspeech2022 Mufan Sang, John H. L. Hansen
Multi-Frequency Information Enhanced Channel Attention Module for Speaker Representation Learning.

Interspeech2022 Jiamin Xie, John H. L. Hansen
DEFORMER: Coupling Deformed Localized Patterns with Global Context for Robust End-to-end Speech Recognition.

Interspeech2022 Mu Yang, Kevin Hirschi, Stephen Daniel Looney, Okim Kang, John H. L. Hansen
Improving Mispronunciation Detection with Wav2vec2-based Momentum Pseudo-Labeling for Accentedness and Intelligibility Assessment.

SpeechComm2021 Fahimeh Bahmaninezhad, Chunlei Zhang, John H. L. Hansen
An investigation of domain adaptation in speaker embedding space for speaker recognition.

SpeechComm2021 Shivesh Ranjan, John H. L. Hansen
Curriculum Learning based approaches for robust end-to-end far-field speech recognition.

SpeechComm2021 John H. L. Hansen, Allen R. Stauffer, Wei Xia, 
Nonlinear waveform distortion: Assessment and detection of clipping on speech data and systems.

TASLP2021 Kazi Nazmul Haque, Rajib Rana, Jiajun Liu, John H. L. Hansen, Nicholas Cummins, Carlos Busso, Björn W. Schuller, 
Guided Generative Adversarial Neural Network for Representation Learning and Audio Generation Using Fewer Labelled Audio Data.

TASLP2021 Finnian Kelly, John H. L. Hansen
Analysis and Calibration of Lombard Effect and Whisper for Speaker Recognition.

TASLP2021 Midia Yousefi, John H. L. Hansen
Block-Based High Performance CNN Architectures for Frame-Level Overlapping Speech Detection.

ICASSP2021 Mufan Sang, Wei Xia, John H. L. Hansen
DEAAN: Disentangled Embedding and Adversarial Adaptation Network for Robust Speaker Representation Learning.

#10  | Junichi Yamagishi | Google Scholar   DBLP
VenuesInterspeech: 47ICASSP: 25TASLP: 12SpeechComm: 2
Years2022: 122021: 72020: 162019: 132018: 102017: 132016: 15
ISCA Sectionspeech synthesis: 13special session: 4voice anti-spoofing and countermeasure: 3voice privacy challenge: 3speech synthesis paradigms and methods: 2the voicemos challenge: 1single-channel and multi-channel speech enhancement: 1speech coding and restoration: 1spoofing-aware automatic speaker verification (sasv): 1intelligibility-enhancing speech modification: 1single-channel speech enhancement: 1emotion modeling and analysis: 1neural techniques for voice conversion and waveform generation: 1the 2019 automatic speaker verification spoofing and countermeasures challenge: 1expressive speech synthesis: 1voice conversion and speech synthesis: 1prosody modeling and generation: 1speaker verification: 1glottal source modeling: 1voice conversion: 1speech perception: 1prosody and text processing: 1wavenet and novel paradigms: 1speech intelligibility: 1speech synthesis prosody: 1resources and annotation of resources: 1co-inference of production and acoustics: 1
IEEE Keywordspeech synthesis: 25speaker recognition: 12vocoders: 7speech recognition: 6speech intelligibility: 5text to speech: 5neural network: 5speaker verification: 4speech coding: 4recurrent neural nets: 4voice conversion: 3filtering theory: 3hidden markov models: 3security of data: 3speaker adaptation: 3autoregressive processes: 3automatic speaker verification: 2musical instruments: 2music: 2natural language processing: 2data privacy: 2mos prediction: 2anti spoofing: 2presentation attack detection: 2speech enhancement: 2reverberation: 2variational auto encoder: 2fourier transforms: 2fundamental frequency: 2probability: 2spoofing attack: 2wavenet: 2autoregressive model: 2security: 1reinforcement learning: 1spoof countermeasures: 1gaussian processes: 1musical instrument embeddings: 1linkability: 1speaker anonymization: 1privacy: 1mean opinion score: 1speech naturalness assessment: 1hearing: 1speech quality assessment: 1efficiency: 1pruning: 1vocoder: 1estimation theory: 1countermeasure: 1logical access: 1computer crime: 1tdnn: 1feedforward neural nets: 1resnet: 1attention: 1deep learning (artificial intelligence): 1time frequency analysis: 1multi metric optimization: 1generative adversarial networks: 1entertainment: 1listening test: 1rakugo: 1representation learning: 1speaker diarization: 1phone recognition: 1vector quantisation: 1disentanglement: 1image coding: 1duration modeling: 1vector quantization: 1spoofing counter measures: 1automatic speaker verification (asv): 1detect ion cost function: 1backpropagation: 1voice cloning: 1waveform model: 1convolution: 1short time fourier transform: 1transfer learning: 1speaker embeddings: 1sequences: 1search problems: 1stochastic processes: 1sequence to sequence model: 1sampling methods: 1zero shot adaptation: 1fine tuning: 1audio signal processing: 1neural waveform synthesizer: 1musical instrument sounds synthesis: 1signal classification: 1cepstral analysis: 1restricted boltzmann machine: 1boltzmann machines: 1complex valued representation: 1gan: 1glottal excitation model: 1inference mechanisms: 1neural vocoding: 1asvspoof: 1replay attacks: 1lom bard speech: 1style conversion: 1pulse model in log domain vocoder: 1vocal effort: 1cyclegan: 1neural net architecture: 1neural waveform modeling: 1maximum likelihood estimation: 1spectral analysis: 1waveform analysis: 1gaussian distribution: 1waveform modeling: 1waveform generators: 1gradient methods: 1tacotron: 1pipelines: 1text analysis: 1remote state estimation: 1generalized closed skew normal distribution: 1event based scheduling: 1f0: 1pitch: 1general adversarial network: 1autoregressive moving average processes: 1autoregressive neural network: 1i vector: 1non parallel training: 1dnns: 1speech manipulation: 1voice morphing: 1recurrent neural network: 1filters: 1mixture density network: 1privacy protection: 1phase coding: 1matrix algebra: 1complex valued neural network: 1complex amplitude: 1phase modelling: 1hybrid synthesis: 1unit selection: 1embedding: 1deep neural networks: 1multi task learning: 1deep neural network: 1wavelet transforms: 1continuous wavelet transform: 1f0 modelling: 1
Most Publications2018: 522020: 482021: 452022: 442019: 44

Affiliations
National Institute of Informatics, Tokyo, Japan
University of Edinburgh, Scotland, UK (former)

TASLP2022 Anssi Kanervisto, Ville Hautamäki, Tomi Kinnunen, Junichi Yamagishi
Optimizing Tandem Speaker Verification and Anti-Spoofing Systems.

TASLP2022 Xuan Shi, Erica Cooper, Junichi Yamagishi
Use of Speaker Recognition Approaches for Learning and Evaluating Embedding Representations of Musical Instrument Sounds.

TASLP2022 Brij Mohan Lal Srivastava, Mohamed Maouche, Md. Sahidullah, Emmanuel Vincent 0001, Aurélien Bellet, Marc Tommasi, Natalia A. Tomashenko, Xin Wang 0037, Junichi Yamagishi
Privacy and Utility of X-Vector Based Speaker Anonymization.

ICASSP2022 Erica Cooper, Wen-Chin Huang, Tomoki Toda, Junichi Yamagishi
Generalization Ability of MOS Prediction Networks.

ICASSP2022 Wen-Chin Huang, Erica Cooper, Junichi Yamagishi, Tomoki Toda, 
LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech.

ICASSP2022 Cheng-I Jeff Lai, Erica Cooper, Yang Zhang 0001, Shiyu Chang, Kaizhi Qian, Yi-Lun Liao, Yung-Sung Chuang, Alexander H. Liu, Junichi Yamagishi, David D. Cox, James R. Glass, 
On the Interplay between Sparsity, Naturalness, Intelligibility, and Prosody in Speech Synthesis.

ICASSP2022 Xin Wang 0037, Junichi Yamagishi
Estimating the Confidence of Speech Spoofing Countermeasure.

ICASSP2022 Chang Zeng, Xin Wang 0037, Erica Cooper, Xiaoxiao Miao, Junichi Yamagishi
Attention Back-End for Automatic Speaker Verification with Multiple Enrollment Utterances.

Interspeech2022 Wen-Chin Huang, Erica Cooper, Yu Tsao 0001, Hsin-Min Wang, Tomoki Toda, Junichi Yamagishi
The VoiceMOS Challenge 2022.

Interspeech2022 Haoyu Li, Junichi Yamagishi
DDS: A new device-degraded speech dataset for speech enhancement.

Interspeech2022 Xiaoxiao Miao, Xin Wang 0037, Erica Cooper, Junichi Yamagishi, Natalia A. Tomashenko, 
Analyzing Language-Independent Speaker Anonymization Framework under Unseen Conditions.

Interspeech2022 Chang Zeng, Lin Zhang, Meng Liu, Junichi Yamagishi
Spoofing-Aware Attention based ASV Back-end with Multiple Enrollment Utterances and a Sampling Strategy for the SASV Challenge 2022.

TASLP2021 Haoyu Li, Junichi Yamagishi
Multi-Metric Optimization Using Generative Adversarial Networks for Near-End Speech Intelligibility Enhancement.

ICASSP2021 Shuhei Kato, Yusuke Yasuda, Xin Wang 0037, Erica Cooper, Junichi Yamagishi
How Similar or Different is Rakugo Speech Synthesizer to Professional Performers?

ICASSP2021 Jennifer Williams 0001, Yi Zhao 0006, Erica Cooper, Junichi Yamagishi
Learning Disentangled Phone and Speaker Representations in a Semi-Supervised VQ-VAE Paradigm.

ICASSP2021 Yusuke Yasuda, Xin Wang 0037, Junichi Yamagishi
End-to-End Text-to-Speech Using Latent Duration Based on VQ-VAE.

Interspeech2021 Tomi Kinnunen, Andreas Nautsch, Md. Sahidullah, Nicholas W. D. Evans, Xin Wang 0037, Massimiliano Todisco, Héctor Delgado, Junichi Yamagishi, Kong Aik Lee, 
Visualizing Classifier Adjacency Relations: A Case Study in Speaker Verification and Voice Anti-Spoofing.

Interspeech2021 Xin Wang 0037, Junichi Yamagishi
A Comparative Study on Recent Neural Spoofing Countermeasures for Synthetic Speech Detection.

Interspeech2021 Lin Zhang, Xin Wang 0037, Erica Cooper, Junichi Yamagishi, Jose Patino 0001, Nicholas W. D. Evans, 
An Initial Investigation for Detecting Partially Spoofed Audio.

TASLP2020 Tomi Kinnunen, Héctor Delgado, Nicholas W. D. Evans, Kong Aik Lee, Ville Vestman, Andreas Nautsch, Massimiliano Todisco, Xin Wang 0037, Md. Sahidullah, Junichi Yamagishi, Douglas A. Reynolds, 
Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification: Fundamentals.

#11  | Xunying Liu | Google Scholar   DBLP
VenuesInterspeech: 39ICASSP: 31TASLP: 14
Years2022: 152021: 242020: 132019: 172018: 92017: 22016: 4
ISCA Sectionspeech recognition of atypical speech: 5voice conversion and adaptation: 2topics in asr: 2asr neural network architectures: 2medical applications and visual asr: 2multi-, cross-lingual and other topics in asr: 1novel models and training methods for asr: 1multimodal speech emotion recognition and paralinguistics: 1miscellaneous topics in speech, voice and hearing disorders: 1speech and language in health: 1zero, low-resource and multi-modal speech recognition: 1voice anti-spoofing and countermeasure: 1non-autoregressive sequential modeling for speech processing: 1assessment of pathological speech and language: 1speaker recognition: 1multimodal speech processing: 1learning techniques for speaker recognition: 1speech and speaker recognition: 1neural techniques for voice conversion and waveform generation: 1speech and audio classification: 1model adaptation for asr: 1lexicon and language model for speech recognition: 1novel neural network architectures for acoustic modelling: 1second language acquisition and code-switching: 1voice conversion: 1multimodal systems: 1expressive speech synthesis: 1application of asr in medical practice: 1acoustic model adaptation: 1new products and services: 1speech synthesis: 1
IEEE Keywordspeech recognition: 28speaker recognition: 14recurrent neural nets: 13natural language processing: 10speech synthesis: 9bayes methods: 8optimisation: 6deep learning (artificial intelligence): 5gaussian processes: 5speech separation: 5neural architecture search: 4bayesian learning: 4speech coding: 4emotion recognition: 4speech emotion recognition: 4quantisation (signal): 4language models: 4speaker adaptation: 3audio visual systems: 3voice conversion: 3speech intelligibility: 3multi channel: 3overlapped speech: 3gradient methods: 3convolutional neural nets: 3recurrent neural network: 3disordered speech recognition: 2handicapped aids: 2time delay neural network: 2audio visual: 2audio signal processing: 2dysarthric speech reconstruction: 2multi look: 2domain adaptation: 2variational inference: 2inference mechanisms: 2lhuc: 2admm: 2transformer: 2knowledge distillation: 2text to speech: 2quantization: 2speaker verification: 2code switching: 2language model: 2entropy: 2elderly speech recognition: 1neural net architecture: 1search problems: 1minimisation: 1uncertainty handling: 1model uncertainty: 1monte carlo methods: 1neural language models: 1speech enhancement: 1dereverberation and recognition: 1reverberation: 1unsupervised learning: 1multitask learning: 1speaker change detection: 1unsupervised speech decomposition: 1speaker identity: 1adversarial speaker adaptation: 1uniform sampling: 1path dropout: 1neural network quantization: 1mean square error methods: 1mixed precision: 1source separation: 1direction of arrival: 1speaker diarization: 1direction of arrival estimation: 1gaussian process: 1lf mmi: 1delays: 1generalisation (artificial intelligence): 1signal sampling: 1location relative attention: 1signal representation: 1signal reconstruction: 1sequence to sequence modeling: 1any to many: 1data augmentation: 1multimodal speech recognition: 1residual error: 1capsule: 1exemplary emotion descriptor: 1expressive speech synthesis: 1spatial information: 1recurrent: 1capsule network: 1sequential: 1tdnn: 1adaptation: 1switchboard: 1low bit quantization: 1lstm rnn: 1filtering theory: 1jointly fine tuning: 1microphone arrays: 1visual occlusion: 1overlapped speech recognition: 1image recognition: 1video signal processing: 1synthetic speech detection: 1replay detection: 1voice activity detection: 1res2net: 1multi scale feature: 1asv anti spoofing: 1alzheimer's disease detection: 1features: 1cognition: 1adress: 1medical diagnostic computing: 1geriatrics: 1asr: 1diseases: 1signal classification: 1patient diagnosis: 1linguistics: 1controllable and efficient: 1autoregressive processes: 1prosody modelling: 1semi autoregressive: 1elderly speech: 1automatic speech recognition: 1neurocognitive disorder detection: 1dementia: 1audio visual speech recognition (avsr): 1visual feature generation: 1phonetic pos teriorgrams: 1x vector: 1gmm i vector: 1adversarial attack: 1accented speech recognition: 1accent conversion: 1cross modal: 1seq2seq: 1data compression: 1recurrent neural networks: 1alternating direction methods of multipliers: 1audio visual speech recognition: 1multi modal: 1probability: 1succeeding words: 1keyword search: 1feedforward: 1end to end: 1multilingual speech synthesis: 1foreign accent: 1activation function selection: 1gaussian process neural network: 1bayesian neural network: 1lstm: 1neural network language models: 1parameter estimation: 1connectionist temporal classification (ctc): 1con volutional neural network (cnn): 1e learning: 1mispronunciation detection and diagnosis (mdd): 1computer assisted pronunciation training (capt): 1capsule networks: 1spatial relationship information: 1recurrent connection: 1utterance level features: 1maximum likelihood estimation: 1hidden markov models: 1rnnlms: 1natural gradient: 1limited memory bfgs: 1second order optimization: 1hessian matrices: 1unsupervised clustering: 1extended phoneme set in l2 speech: 1mispronunciation patterns: 1phonemic posterior grams: 1mispronunciation detection and diagnosis: 1style adaptation: 1regression analysis: 1expressiveness: 1speaking style: 1style feature: 1multi task learning: 1acoustic model: 1structured output layer: 1deep bidirectional long short term memory: 1variance regularisation: 1gpu: 1graphics processing units: 1pipelined training: 1noise contrastive: 1estimation: 1audio segmentation: 1deep neural network: 1television broadcasting: 1pattern clustering: 1multi genre broadcast data: 1error analysis: 1
Most Publications2022: 442021: 382020: 272019: 182015: 14

Affiliations
URLs

TASLP2022 Mengzhe Geng, Xurong Xie, Zi Ye, Tianzi Wang, Guinan Li, Shujie Hu, Xunying Liu, Helen Meng, 
Speaker Adaptation Using Spectro-Temporal Deep Features for Dysarthric and Elderly Speech Recognition.

TASLP2022 Shoukang Hu, Xurong Xie, Mingyu Cui, Jiajun Deng, Shansong Liu, Jianwei Yu, Mengzhe Geng, Xunying Liu, Helen Meng, 
Neural Architecture Search for LF-MMI Trained Time Delay Neural Networks.

TASLP2022 Boyang Xue, Shoukang Hu, Junhao Xu, Mengzhe Geng, Xunying Liu, Helen Meng, 
Bayesian Neural Network Language Modeling for Speech Recognition.

ICASSP2022 Guinan Li, Jianwei Yu, Jiajun Deng, Xunying Liu, Helen Meng, 
Audio-Visual Multi-Channel Speech Separation, Dereverberation and Recognition.

ICASSP2022 Hang Su, Danyang Zhao, Long Dang, Minglei Li, Xixin Wu, Xunying Liu, Helen Meng, 
A Multitask Learning Framework for Speaker Change Detection with Content Information from Unsupervised Speech Decomposition.

ICASSP2022 Disong Wang, Songxiang Liu, Xixin Wu, Hui Lu, Lifa Sun, Xunying Liu, Helen Meng, 
Speaker Identity Preservation in Dysarthric Speech Reconstruction by Adversarial Speaker Adaptation.

ICASSP2022 Xixin Wu, Shoukang Hu, Zhiyong Wu 0001, Xunying Liu, Helen Meng, 
Neural Architecture Search for Speech Emotion Recognition.

ICASSP2022 Junhao Xu, Jianwei Yu, Xunying Liu, Helen Meng, 
Mixed Precision DNN Quantization for Overlapped Speech Separation and Recognition.

ICASSP2022 Naijun Zheng, Na Li 0012, Jianwei Yu, Chao Weng, Dan Su 0002, Xunying Liu, Helen Meng, 
Multi-Channel Speaker Diarization Using Spatial Features for Meetings.

Interspeech2022 Mingyu Cui, Jiajun Deng, Shoukang Hu, Xurong Xie, Tianzi Wang, Shujie Hu, Mengzhe Geng, Boyang Xue, Xunying Liu, Helen Meng, 
Two-pass Decoding and Cross-adaptation Based System Combination of End-to-end Conformer and Hybrid TDNN ASR Systems.

Interspeech2022 Jiajun Deng, Xurong Xie, Tianzi Wang, Mingyu Cui, Boyang Xue, Zengrui Jin, Mengzhe Geng, Guinan Li, Xunying Liu, Helen Meng, 
Confidence Score Based Conformer Speaker Adaptation for Speech Recognition.

Interspeech2022 Jinchao Li, Shuai Wang, Yang Chao, Xunying Liu, Helen Meng, 
Context-aware Multimodal Fusion for Emotion Recognition.

Interspeech2022 Tianzi Wang, Jiajun Deng, Mengzhe Geng, Zi Ye, Shoukang Hu, Yi Wang, Mingyu Cui, Zengrui Jin, Xunying Liu, Helen Meng, 
Conformer Based Elderly Speech Recognition System for Alzheimer's Disease Detection.

Interspeech2022 Yi Wang, Tianzi Wang, Zi Ye, Lingwei Meng, Shoukang Hu, Xixin Wu, Xunying Liu, Helen Meng, 
Exploring linguistic feature and model combination for speech recognition based automatic AD detection.

Interspeech2022 Junhao Xu, Shoukang Hu, Xunying Liu, Helen Meng, 
Towards Green ASR: Lossless 4-bit Quantization of a Hybrid TDNN System on the 300-hr Swithboard Corpus.

TASLP2021 Shoukang Hu, Xurong Xie, Shansong Liu, Jianwei Yu, Zi Ye, Mengzhe Geng, Xunying Liu, Helen Meng, 
Bayesian Learning of LF-MMI Trained Time Delay Neural Networks for Speech Recognition.

TASLP2021 Songxiang Liu, Yuewen Cao, Disong Wang, Xixin Wu, Xunying Liu, Helen Meng, 
Any-to-Many Voice Conversion With Location-Relative Sequence-to-Sequence Modeling.

TASLP2021 Shansong Liu, Mengzhe Geng, Shoukang Hu, Xurong Xie, Mingyu Cui, Jianwei Yu, Xunying Liu, Helen Meng, 
Recent Progress in the CUHK Dysarthric Speech Recognition System.

TASLP2021 Xixin Wu, Yuewen Cao, Hui Lu, Songxiang Liu, Shiyin Kang, Zhiyong Wu 0001, Xunying Liu, Helen Meng, 
Exemplar-Based Emotive Speech Synthesis.

TASLP2021 Xixin Wu, Yuewen Cao, Hui Lu, Songxiang Liu, Disong Wang, Zhiyong Wu 0001, Xunying Liu, Helen Meng, 
Speech Emotion Recognition Using Sequential Capsule Networks.

#12  | Lei Xie 0001 | Google Scholar   DBLP
VenuesInterspeech: 50ICASSP: 22TASLP: 4SpeechComm: 1
Years2022: 192021: 152020: 92019: 152018: 82017: 42016: 7
ISCA Sectionspeech synthesis: 11voice conversion and adaptation: 4spoken term detection: 3speaker and language recognition: 2asr: 2adjusting to speaker, accent, and domain: 2novel models and training methods for asr: 1multi-, cross-lingual and other topics in asr: 1other topics in speech recognition: 1spoofing-aware automatic speaker verification (sasv): 1dereverberation and echo cancellation: 1tools, corpora and resources: 1non-autoregressive sequential modeling for speech processing: 1interspeech 2021 deep noise suppression challenge: 1resource-constrained asr: 1search/decoding techniques and confidence measures for asr: 1interspeech 2021 acoustic echo cancellation challenge: 1robust speaker recognition: 1deep noise suppression challenge: 1summarization, semantic analysis and classification: 1the attacker’s perpective on automatic speaker verification: 1streaming asr: 1model adaptation for asr: 1asr for noisy and far-field speech: 1cross-lingual and multilingual asr: 1speech technologies for code-switching in multilingual communities: 1extracting information from audio: 1robust speech recognition: 1search, computational strategies and language modeling: 1voice conversion: 1resources and annotation of resources: 1feature extraction and acoustic modeling using neural networks for asr: 1
IEEE Keywordspeech recognition: 12natural language processing: 9speaker recognition: 6speech synthesis: 4automatic speech recognition: 4decoding: 3end to end speech recognition: 3text analysis: 2linguistics: 2reverberation: 2end to end asr: 2audio signal processing: 2speech coding: 2gradient methods: 2keyword spotting: 2end to end: 2attention: 2domain adversarial training: 2attention based model: 2spoken term detection: 2style transfer: 1variational inference: 1disjoint datasets: 1autoregressive processes: 1style and speaker attributes: 1neural tts: 1computational linguistics: 1text to speech (tts): 1long form: 1cross sentence: 1modulation: 1medical signal processing: 1speech enhancement and dereverberation: 1speech enhancement: 1uformer: 1encoder decoder attention: 1hybrid encoder and decoder: 1filtering theory: 1dilated complex dual path conformer: 1topic realted rescoring: 1conversational asr: 1latent variational module: 1multi speaker asr: 1alimeeting: 1speaker diarization: 1m2met: 1meeting transcription: 1microphone arrays: 1adversarial learning: 1vocoders: 1normalizing flows: 1variational autoencoder: 1music: 1singing voice synthesis: 1lattice pruning: 1lattice generation: 1decoder: 1acoustic modeling: 1accented speech recognition: 1accent recognition: 1convolutional neural nets: 1voice activity detection: 1transformer: 1lf mmi: 1streaming: 1computational complexity: 1wake word detection: 1wavenet adaptation: 1singular value decomposition (svd): 1singular value decomposition: 1voice conversion (vc): 1speech bandwidth extension: 1multi scale fusion: 1sensor fusion: 1signal restoration: 1time domain analysis: 1data mining: 1document image processing: 1neural net architecture: 1wake up word detection: 1class imbalance: 1hard examples: 1adversarial training: 1sequence to sequence: 1interference suppression: 1error statistics: 1cross entropy: 1listen attend and spell: 1statistical distributions: 1virtual adversarial training: 1asr: 1computer aided instruction: 1esl: 1call: 1language model: 1code switching: 1pattern classification: 1kws: 1adversarial examples: 1permutation invariant training: 1pitch tracking: 1source separation: 1speech separation: 1deep clustering: 1self attention: 1relative position aware representation: 1recurrent neural nets: 1sequence to sequence model: 1text to speech synthesis: 1audio visual speech recognition: 1audio visual systems: 1multi condition training: 1robust speech recognition: 1dropout: 1bimodal df smn: 1gaussian noise: 1attention model: 1voice search: 1unsupervised domain adaptation: 1laplacian eigenmaps: 1probability: 1laplacian probabilistic latent semantic analysis: 1graph regularization: 1matrix algebra: 1graph theory: 1topic modeling: 1data structures: 1topic segmentation: 1data reduction: 1pattern matching: 1pairwise learning: 1autoencoder: 1low resource speech processing: 1bottleneck features: 1sparse representation: 1signal representation: 1timbre: 1voice conversion: 1exemplar: 1prosody: 1query processing: 1data augmentation: 1time series: 1dtw: 1partial matching: 1query by example: 1
Most Publications2022: 812021: 702020: 402019: 372018: 24

Affiliations
Northwestern Polytechnical University, School of Computer Science, Xi'an, China
The Chinese University of Hong Kong, Department of Systems Engineering and Engineering Management, Hong Kong (2006 - 2007)
City University of Hong Kong, School of Creative Media, Hong Kong (2004 - 2006)
Northwestern Polytechnical University, Xi'an, China (PhD 2004)
Vrije Universiteit Brussel, Department of Electronics and Information Processing, Belgium (2001 - 2002)

TASLP2022 Xiaochun An, Frank K. Soong, Lei Xie 0001
Disentangling Style and Speaker Attributes for TTS Style Transfer.

TASLP2022 Liumeng Xue, Frank K. Soong, Shaofei Zhang, Lei Xie 0001
ParaTTS: Learning Linguistic and Prosodic Cross-Sentence Information in Paragraph-Based TTS.

ICASSP2022 Yihui Fu, Yun Liu, Jingdong Li, Dawei Luo, Shubo Lv, Yukai Jv, Lei Xie 0001
Uformer: A Unet Based Dilated Complex & Real Dual-Path Conformer Network for Simultaneous Speech Enhancement and Dereverberation.

ICASSP2022 Kun Wei, Yike Zhang, Sining Sun, Lei Xie 0001, Long Ma, 
Conversational Speech Recognition by Learning Conversation-Level Characteristics.

ICASSP2022 Fan Yu, Shiliang Zhang, Pengcheng Guo, Yihui Fu, Zhihao Du, Siqi Zheng, Weilong Huang, Lei Xie 0001, Zheng-Hua Tan, DeLiang Wang, Yanmin Qian, Kong Aik Lee, Zhijie Yan, Bin Ma 0001, Xin Xu, Hui Bu, 
Summary on the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge.

ICASSP2022 Yongmao Zhang, Jian Cong, Heyang Xue, Lei Xie 0001, Pengcheng Zhu, Mengxiao Bi, 
VISinger: Variational Inference with Adversarial Learning for End-to-End Singing Voice Synthesis.

Interspeech2022 Yi Lei, Shan Yang, Jian Cong, Lei Xie 0001, Dan Su 0002, 
Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice Conversion.

Interspeech2022 Tao Li, Xinsheng Wang, Qicong Xie, Zhichao Wang, Mingqi Jiang, Lei Xie 0001
Cross-speaker Emotion Transfer Based On Prosody Compensation for End-to-End Speech Synthesis.

Interspeech2022 Qijie Shao, Jinghao Yan, Jian Kang 0006, Pengcheng Guo, Xian Shi, Pengfei Hu, Lei Xie 0001
Linguistic-Acoustic Similarity Based Accent Shift for Accent Recognition.

Interspeech2022 Yu Wang, Xinsheng Wang, Pengcheng Zhu, Jie Wu, Hanzhao Li, Heyang Xue, Yongmao Zhang, Lei Xie 0001, Mengxiao Bi, 
Opencpop: A High-Quality Open Source Chinese Popular Song Corpus for Singing Voice Synthesis.

Interspeech2022 Kun Wei, Yike Zhang, Sining Sun, Lei Xie 0001, Long Ma, 
Leveraging Acoustic Contextual Representation by Audio-textual Cross-modal Learning for Conversational ASR.

Interspeech2022 Heyang Xue, Xinsheng Wang, Yongmao Zhang, Lei Xie 0001, Pengcheng Zhu, Mengxiao Bi, 
Learn2Sing 2.0: Diffusion and Mutual Information-Based Target Speaker SVS by Learning from Singing Teacher.

Interspeech2022 Liumeng Xue, Shan Yang, Na Hu, Dan Su 0002, Lei Xie 0001
Learning Noise-independent Speech Representation for High-quality Voice Conversion for Noisy Target Speakers.

Interspeech2022 Zhanheng Yang, Hang Lv 0001, Xiong Wang, Ao Zhang, Lei Xie 0001
Minimizing Sequential Confusion Error in Speech Command Recognition.

Interspeech2022 Zhanheng Yang, Sining Sun, Jin Li, Xiaoming Zhang, Xiong Wang, Long Ma, Lei Xie 0001
CaTT-KWS: A Multi-stage Customized Keyword Spotting Framework based on Cascaded Transducer-Transformer.

Interspeech2022 Fan Yu, Zhihao Du, Shiliang Zhang, Yuxiao Lin, Lei Xie 0001
A Comparative Study on Speaker-attributed Automatic Speech Recognition in Multi-party Meetings.

Interspeech2022 Li Zhang 0084, Yue Li, Huan Zhao, Qing Wang 0039, Lei Xie 0001
Backend Ensemble for Speaker Verification and Spoofing Countermeasure.

Interspeech2022 Shimin Zhang, Ziteng Wang, Yukai Ju, Yihui Fu, Yueyue Na, Qiang Fu, Lei Xie 0001
Personalized Acoustic Echo Cancellation for Full-duplex Communications.

Interspeech2022 Binbin Zhang, Di Wu, Zhendong Peng, Xingchen Song, Zhuoyuan Yao, Hang Lv 0001, Lei Xie 0001, Chao Yang, Fuping Pan, Jianwei Niu 0002, 
WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit.

SpeechComm2021 Hongqiang Du, Xiaohai Tian, Lei Xie 0001, Haizhou Li 0001, 
Factorized WaveNet for voice conversion with limited data.

#13  | Hung-yi Lee | Google Scholar   DBLP
VenuesInterspeech: 43ICASSP: 24TASLP: 8ACL: 2
Years2022: 202021: 142020: 142019: 132018: 72017: 52016: 4
ISCA Sectionspeech synthesis: 4spoken language processing: 2adaptation, transfer learning, and distillation for asr: 2voice conversion and adaptation: 2new trends in self-supervised speech processing: 2neural techniques for voice conversion and waveform generation: 2spoken term detection: 2dialogue systems and analysis of dialogue: 2speech analysis: 1the voicemos challenge: 1trustworthy speech processing: 1spoofing-aware automatic speaker verification (sasv): 1embedding and network architecture for speaker recognition: 1neural network training methods for asr: 1source separation: 1spoken term detection & voice search: 1voice anti-spoofing and countermeasure: 1speech signal analysis and representation: 1search for speech recognition: 1conversational systems: 1speech synthesis paradigms and methods: 1applications of language technologies: 1language learning and databases: 1speech enhancement: 1the zero resource speech challenge 2019: 1turn management in dialogue: 1speech and audio source separation and scene analysis: 1voice conversion: 1extracting information from audio: 1spoken language understanding: 1acoustic modelling: 1spoken document processing: 1speech and audio segmentation and classification: 1
IEEE Keywordspeech recognition: 17speaker recognition: 11natural language processing: 9speech synthesis: 7unsupervised learning: 5security of data: 4voice conversion: 4meta learning: 3self supervised learning: 3speech representation learning: 3adversarial attack: 3audio signal processing: 3interactive systems: 3automatic speech recognition: 3signal representation: 3text analysis: 3speech coding: 2few shot: 2maml: 2hidden markov models: 2supervised learning: 2biometrics (access control): 2anti spoofing: 2speech separation: 2source separation: 2low resource: 2end to end: 2adversarial training: 2reinforcement learning: 2signal sampling: 1speaker adaptation: 1tts: 1phone recognition: 1generative adversarial network: 1pattern classification: 1adversarial attacks: 1automatic speaker verification: 1adversarial defense: 1data handling: 1model compression: 1knowledge distillation: 1noise robustness: 1speech enhancement: 1voice activity detection: 1self supervised speech representation: 1computer based training: 1open source: 1self supervised speech models: 1data bias: 1superb benchmark: 1speaker verification: 1vocoder: 1vocoders: 1partially fake audio detection: 1audio deep synthesis detection challenge: 1language translation: 1sensor fusion: 1decoding: 1speech translation: 1self supervised: 1pre training: 1representation: 1adaptive instance normalization: 1disentangled representations: 1activation guidance: 1speaker representation: 1multi speaker text to speech: 1semi supervised learning: 1spoken language understanding: 1any to any: 1transformer: 1attention mechanism: 1con catenative: 1anil: 1language adaptation: 1iarpa babel: 1interpretability: 1analysis: 1speech representation: 1quantisation (signal): 1representation quantization: 1linguistics: 1transformer encoders: 1unsupervised training: 1spatial smoothing: 1spoofing countermeasure: 1permutation invariant training: 1label ambiguity problem: 1cocktail party problem: 1toefl: 1computer aided instruction: 1attention model: 1squad: 1speech question answering: 1sqa: 1domain adaptation: 1adversarial learning: 1spoken question answering: 1criticizing language model: 1deep reinforcement learning: 1deep q network: 1dialogue state tracking: 1content based retrieval: 1spoken content retrieval: 1deep q learning: 1user machine interaction: 1information retrieval: 1key term extraction: 1long short term memory (lstm): 1spoken content: 1internet: 1domain independent: 1word processing: 1audio word2vec: 1spoken term detection: 1autoencoder: 1language transfer: 1seq2seq: 1recurrent neural network: 1social networking (online): 1recurrent neural nets: 1social network: 1personalized language modeling: 1mobile computing: 1rnnlm: 1computational complexity: 1neural turing machine: 1turing machines: 1multitask learning: 1transfer learning: 1deep neural network: 1speech adaptation: 1
Most Publications2022: 902021: 572020: 542019: 512018: 38

Affiliations
URLs

TASLP2022 Sung-Feng Huang, Chyi-Jiunn Lin, Da-Rong Liu, Yi-Chen Chen, Hung-yi Lee
Meta-TTS: Meta-Learning for Few-Shot Speaker Adaptive Text-to-Speech.

TASLP2022 Da-Rong Liu, Po-Chun Hsu, Yi-Chen Chen, Sung-Feng Huang, Shun-Po Chuang, Da-Yi Wu, Hung-yi Lee
Learning Phone Recognition From Unpaired Audio and Phone Sequences Based on Generative Adversarial Network.

TASLP2022 Haibin Wu, Xu Li, Andy T. Liu, Zhiyong Wu 0001, Helen Meng, Hung-Yi Lee
Improving the Adversarial Robustness for Speaker Verification by Self-Supervised Learning.

ICASSP2022 Heng-Jui Chang, Shu-Wen Yang, Hung-yi Lee
Distilhubert: Speech Representation Learning by Layer-Wise Distillation of Hidden-Unit Bert.

ICASSP2022 Chien-yu Huang, Kai-Wei Chang, Hung-Yi Lee
Toward Degradation-Robust Voice Conversion.

ICASSP2022 Wen-Chin Huang, Shu-Wen Yang, Tomoki Hayashi, Hung-Yi Lee, Shinji Watanabe 0001, Tomoki Toda, 
S3PRL-VC: Open-Source Voice Conversion Framework with Self-Supervised Speech Representations.

ICASSP2022 Yen Meng, Yi-Hui Chou, Andy T. Liu, Hung-yi Lee
Don't Speak Too Fast: The Impact of Data Bias on Self-Supervised Speech Models.

ICASSP2022 Haibin Wu, Po-Chun Hsu, Ji Gao, Shanshan Zhang, Shen Huang, Jian Kang 0006, Zhiyong Wu 0001, Helen Meng, Hung-Yi Lee
Adversarial Sample Detection for Speaker Verification by Neural Vocoders.

ICASSP2022 Haibin Wu, Heng-Cheng Kuo, Naijun Zheng, Kuo-Hsuan Hung, Hung-Yi Lee, Yu Tsao 0001, Hsin-Min Wang, Helen Meng, 
Partially Fake Audio Detection by Self-Attention-Based Fake Span Discovery.

Interspeech2022 Chih-Chiang Chang, Hung-yi Lee
Exploring Continuous Integrate-and-Fire for Adaptive Simultaneous Speech Translation.

Interspeech2022 Kai-Wei Chang, Wei-Cheng Tseng, Shang-Wen Li 0001, Hung-yi Lee
An Exploration of Prompt Tuning on Generative Spoken Language Model for Speech Processing Tasks.

Interspeech2022 Wei-Ping Huang, Po-Chun Chen, Sung-Feng Huang, Hung-yi Lee
Few Shot Cross-Lingual TTS Using Transferable Phoneme Embedding.

Interspeech2022 Kuan-Po Huang, Yu-Kuan Fu, Yu Zhang 0033, Hung-yi Lee
Improving Distortion Robustness of Self-supervised Speech Processing Tasks with Domain Adaptation.

Interspeech2022 Guan-Ting Lin, Shang-Wen Li 0001, Hung-yi Lee
Listen, Adapt, Better WER: Source-free Single-utterance Test-time Adaptation for Automatic Speech Recognition.

Interspeech2022 Guan-Ting Lin, Yung-Sung Chuang, Ho-Lam Chung, Shu-Wen Yang, Hsuan-Jui Chen, Shuyan Annie Dong, Shang-Wen Li 0001, Abdelrahman Mohamed, Hung-yi Lee, Lin-Shan Lee, 
DUAL: Discrete Spoken Unit Adaptive Learning for Textless Spoken Question Answering.

Interspeech2022 Wei-Cheng Tseng, Wei-Tsung Kao, Hung-yi Lee
DDOS: A MOS Prediction Framework utilizing Domain Adaptive Pre-training and Distribution of Opinion Scores.

Interspeech2022 Wei-Cheng Tseng, Wei-Tsung Kao, Hung-yi Lee
Membership Inference Attacks Against Self-supervised Speech Models.

Interspeech2022 Haibin Wu, Lingwei Meng, Jiawen Kang, Jinchao Li, Xu Li, Xixin Wu, Hung-yi Lee, Helen Meng, 
Spoofing-Aware Speaker Verification by Multi-Level Fusion.

Interspeech2022 Yang Zhang, Zhiqiang Lv, Haibin Wu, Shanshan Zhang, Pengfei Hu, Zhiyong Wu 0001, Hung-yi Lee, Helen Meng, 
MFA-Conformer: Multi-scale Feature Aggregation Conformer for Automatic Speaker Verification.

ACL2022 Hsiang-Sheng Tsai, Heng-Jui Chang, Wen-Chin Huang, Zili Huang, Kushal Lakhotia, Shu-Wen Yang, Shuyan Dong, Andy T. Liu, Cheng-I Lai, Jiatong Shi, Xuankai Chang, Phil Hall, Hsuan-Jui Chen, Shang-Wen Li 0001, Shinji Watanabe 0001, Abdelrahman Mohamed, Hung-yi Lee
SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities.

#14  | Jinyu Li 0001 | Google Scholar   DBLP
VenuesInterspeech: 38ICASSP: 32TASLP: 4ACL: 1
Years2022: 172021: 162020: 152019: 102018: 92017: 42016: 4
ISCA Sectionnovel models and training methods for asr: 3source separation: 3asr neural network architectures: 3streaming for asr/rnn transducers: 2multi- and cross-lingual asr, other topics in asr: 2streaming asr: 2speaker and language recognition: 1other topics in speech recognition: 1robust asr, and far-field/multi-talker asr: 1spoken language processing: 1topics in asr: 1self-supervision and semi-supervision for neural asr training: 1neural network training methods for asr: 1language and lexical modeling for asr: 1asr model training and strategies: 1acoustic model adaptation for asr: 1new trends in self-supervised speech processing: 1asr neural network architectures and training: 1search for speech recognition: 1multi-channel speech enhancement: 1spoken term detection, confidence measure, and end-to-end speech recognition: 1asr neural network training: 1neural network training strategies for asr: 1novel neural network architectures for acoustic modelling: 1novel approaches to enhancement: 1deep enhancement: 1noise reduction: 1acoustic model adaptation: 1neural networks in speech recognition: 1
IEEE Keywordspeech recognition: 27recurrent neural nets: 12speaker recognition: 11natural language processing: 7automatic speech recognition: 6speaker adaptation: 4deep neural network: 4attention: 4ctc: 4speech enhancement: 3self supervised learning: 3continuous speech separation: 3source separation: 3transformer: 3unsupervised learning: 3lstm: 3teacher student learning: 3signal classification: 3adversarial learning: 3multi talker asr: 2transducer: 2meeting transcription: 2speech separation: 2decoding: 2encoding: 2end to end: 2audio signal processing: 2permutation invariant training: 2acoustic to word: 2vocabulary: 2oov: 2neural network: 2end to end training: 2deep neural networks: 2matrix algebra: 2contextual spelling correction: 1contextual biasing: 1non autoregressive: 1streaming: 1end to end end point detection: 1dual path rnn: 1long form meeting transcription: 1representation learning: 1contrastive learning: 1wav2vec 2.0: 1robust speech recognition: 1robust automatic speech recognition: 1supervised learning: 1recurrent selective attention network: 1transformer transducer: 1configurable multilingual model: 1multilingual speech recognition: 1signal representation: 1multi channel microphone: 1deep learning (artificial intelligence): 1real time decoding: 1conformer: 1multi speaker asr: 1recurrent neural network transducer: 1attention based encoder decoder: 1language model: 1combination: 1segmentation: 1filtering theory: 1speaker diarization: 1system fusion: 1acoustic model adaptation: 1speech synthesis: 1neural language generation: 1libricss: 1microphones: 1overlapped speech: 1streaming attention based sequence to sequence asr: 1pattern classification: 1latency reduction: 1monotonic chunkwise attention: 1latency: 1computer aided instruction: 1entropy: 1domain adaptation: 1backpropagation: 1knowledge representation: 1label embedding: 1speech coding: 1end to end system: 1universal acoustic model: 1mixture models: 1adaptation: 1interpolation: 1mixture of experts: 1senone classification: 1future context frames: 1layer trajectory: 1temporal modeling: 1asr: 1language identification: 1code switching: 1domain invariant training: 1speaker verification: 1unsupervised single channel overlapped speech recognition: 1sequence discriminative training: 1transfer learning: 1progressive joint training: 1acoustic modeling: 1far field: 1acoustic model: 1spotting: 1data compression: 1speaker invariant training: 1probability: 1adversariallearning: 1mean square error methods: 1cepstra minimum mean square error: 1cepstral analysis: 1smoothing methods: 1noise robustness: 1feedforward neural nets: 1recurrent neural network: 1long short term memory: 1sequence training: 1support vector machines: 1maximum margin: 1svm: 1
Most Publications2022: 472021: 442020: 392018: 252019: 24

Affiliations
Microsoft Corporation, Redmond, WA, USA
Georgia Institute of Technology, Center for Signal and Image Processing, Atlanta, GA, USA (PhD)
University of Science and Technology of China, iFlytek Speech Lab, Hefei, China

TASLP2022 Xiaoqiang Wang, Yanqing Liu, Jinyu Li 0001, Veljko Miljanic, Sheng Zhao, Hosam Khalil, 
Towards Contextual Spelling Correction for Customization of End-to-End Speech Recognition Systems.

ICASSP2022 Liang Lu 0001, Jinyu Li 0001, Yifan Gong 0001, 
Endpoint Detection for Streaming End-to-End Multi-Talker ASR.

ICASSP2022 Desh Raj, Liang Lu 0001, Zhuo Chen 0006, Yashesh Gaur, Jinyu Li 0001
Continuous Streaming Multi-Talker ASR with Dual-Path Transducers.

ICASSP2022 Yiming Wang, Jinyu Li 0001, Heming Wang, Yao Qian, Chengyi Wang 0002, Yu Wu 0012, 
Wav2vec-Switch: Contrastive Learning from Original-Noisy Speech Pairs for Robust Speech Recognition.

ICASSP2022 Heming Wang, Yao Qian, Xiaofei Wang 0009, Yiming Wang, Chengyi Wang 0002, Shujie Liu 0001, Takuya Yoshioka, Jinyu Li 0001, DeLiang Wang, 
Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction.

ICASSP2022 Chengyi Wang 0002, Yu Wu 0012, Sanyuan Chen, Shujie Liu 0001, Jinyu Li 0001, Yao Qian, Zhenglu Yang, 
Improving Self-Supervised Learning for Speech Recognition with Intermediate Layer Supervision.

ICASSP2022 Yixuan Zhang, Zhuo Chen 0006, Jian Wu 0027, Takuya Yoshioka, Peidong Wang, Zhong Meng, Jinyu Li 0001
Continuous Speech Separation with Recurrent Selective Attention Network.

ICASSP2022 Long Zhou, Jinyu Li 0001, Eric Sun, Shujie Liu 0001, 
A Configurable Multilingual Model is All You Need to Recognize All Languages.

Interspeech2022 Junyi Ao, Ziqiang Zhang, Long Zhou, Shujie Liu 0001, Haizhou Li 0001, Tom Ko, Lirong Dai 0001, Jinyu Li 0001, Yao Qian, Furu Wei, 
Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data.

Interspeech2022 Sanyuan Chen, Yu Wu 0012, Chengyi Wang 0002, Shujie Liu 0001, Zhuo Chen 0006, Peidong Wang, Gang Liu, Jinyu Li 0001, Jian Wu 0027, Xiangzhan Yu, Furu Wei, 
Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?

Interspeech2022 Naoyuki Kanda, Jian Wu 0027, Yu Wu 0012, Xiong Xiao, Zhong Meng, Xiaofei Wang 0009, Yashesh Gaur, Zhuo Chen 0006, Jinyu Li 0001, Takuya Yoshioka, 
Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings.

Interspeech2022 Naoyuki Kanda, Jian Wu 0027, Yu Wu 0012, Xiong Xiao, Zhong Meng, Xiaofei Wang 0009, Yashesh Gaur, Zhuo Chen 0006, Jinyu Li 0001, Takuya Yoshioka, 
Streaming Multi-Talker ASR with Token-Level Serialized Output Training.

Interspeech2022 Zhong Meng, Yashesh Gaur, Naoyuki Kanda, Jinyu Li 0001, Xie Chen 0001, Yu Wu 0012, Yifan Gong 0001, 
Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition.

Interspeech2022 Chengyi Wang 0002, Yiming Wang, Yu Wu 0012, Sanyuan Chen, Jinyu Li 0001, Shujie Liu 0001, Furu Wei, 
Supervision-Guided Codebooks for Masked Prediction in Speech Pre-training.

Interspeech2022 Jian Xue, Peidong Wang, Jinyu Li 0001, Matt Post, Yashesh Gaur, 
Large-Scale Streaming End-to-End Speech Translation with Neural Transducers.

Interspeech2022 Wangyou Zhang, Zhuo Chen 0006, Naoyuki Kanda, Shujie Liu 0001, Jinyu Li 0001, Sefik Emre Eskimez, Takuya Yoshioka, Xiong Xiao, Zhong Meng, Yanmin Qian, Furu Wei, 
Separating Long-Form Speech with Group-wise Permutation Invariant Training.

ACL2022 Junyi Ao, Rui Wang, Long Zhou, Chengyi Wang 0002, Shuo Ren, Yu Wu 0012, Shujie Liu 0001, Tom Ko, Qing Li, Yu Zhang 0006, Zhihua Wei, Yao Qian, Jinyu Li 0001, Furu Wei, 
SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing.

ICASSP2021 Sanyuan Chen, Yu Wu 0012, Zhuo Chen 0006, Takuya Yoshioka, Shujie Liu 0001, Jin-Yu Li 0001, Xiangzhan Yu, 
Don't Shoot Butterfly with Rifles: Multi-Channel Continuous Speech Separation with Early Exit Transformer.

ICASSP2021 Xie Chen 0001, Yu Wu 0012, Zhenghao Wang, Shujie Liu 0001, Jinyu Li 0001
Developing Real-Time Streaming Transformer Transducer for Speech Recognition on Large-Scale Dataset.

ICASSP2021 Sanyuan Chen, Yu Wu 0012, Zhuo Chen 0006, Jian Wu 0027, Jinyu Li 0001, Takuya Yoshioka, Chengyi Wang 0002, Shujie Liu 0001, Ming Zhou 0001, 
Continuous Speech Separation with Conformer.

#15  | Jim Glass | Google Scholar   DBLP
VenuesInterspeech: 37ICASSP: 24TASLP: 5NeurIPS: 4ACL: 3AAAI: 1ICLR: 1
Years2022: 62021: 122020: 122019: 172018: 112017: 82016: 9
ISCA Sectionlanguage recognition: 3new trends in self-supervised speech processing: 2speech synthesis: 1acoustic event detection and acoustic scene classification: 1assessment of pathological speech and language: 1non-autoregressive sequential modeling for speech processing: 1spoken dialogue systems: 1tools, corpora and resources: 1multimodal systems: 1low-resource speech recognition: 1speech signal representation: 1spoken dialogue system: 1speech translation and multilingual/multimodal learning: 1speaker recognition challenges and applications: 1zero-resource asr: 1end-to-end speech recognition: 1speech signal characterization: 1speech and audio classification: 1speech and audio source separation and scene analysis: 1speech recognition and beyond: 1dialogue speech understanding: 1applications of language technologies: 1speaker recognition and diarization: 1speaker recognition: 1sequence models for asr: 1integrating speech science and technology for clinical applications: 1deep neural networks: 1robust speech recognition: 1neural network training strategies for asr: 1music and audio processing: 1voice conversion: 1language understanding and generation: 1new trends in neural networks for speech recognition: 1decoding, system combination: 1
IEEE Keywordspeech recognition: 19natural language processing: 13speaker recognition: 4crowdsourcing: 4feedforward neural nets: 4information retrieval: 3pattern classification: 3speech synthesis: 3unsupervised learning: 3dialect identification: 3unsupervised speech processing: 3natural language interfaces: 2support vector machines: 2audio signal processing: 2self supervised learning: 2recurrent neural nets: 2speech representation learning: 2image representation: 2interactive systems: 2text analysis: 2self attention: 2language identification: 2convolutional neural nets: 2vision and language: 2speech coding: 2convolution: 2convolutional neural networks: 2probability: 2word vectors: 2semantic tagging: 2bottleneck features: 2transformer: 1pronunciation assessment: 1transformers: 1speech: 1audio classification: 1corpus: 1vocal sounds: 1condition monitoring: 1signal classification: 1self training: 1cross lingual transfer learning: 1asr: 1adaptation: 1efficiency: 1speech intelligibility: 1pruning: 1vocoders: 1vocoder: 1text to speech: 1signal sampling: 1ensemble: 1transfer learning: 1noisy label: 1audio event classification: 1imbalanced learning: 1audio tagging: 1comparative analysis: 1unsupervised pre training: 1semi supervised learning: 1spoken language understanding: 1wordpiece: 1end to end: 1maximum likelihood estimation: 1subword: 1and cross lingual retrieval: 1semantic embedding space: 1vision and spoken language: 1arabic dialect: 1large scale: 1social networking (online): 1dataset: 1semantic embedding: 1query processing: 1query languages: 1reinforcement learning: 1convolutional neural network: 1dialogue system: 1entropy: 1gaussian processes: 1brain: 1dnns: 1cepstral analysis: 1speaker verification: 1bottleneck feature: 1image segmentation: 1gmm ubm: 1pattern clustering: 1time contrastive learning: 1language translation: 1speech2vec: 1bilingual lexicon induction: 1speech to text translation: 1multimodal speech processing: 1variational autoencoder: 1adversarial training: 1text to speech synthesis: 1data augmentation: 1fusion: 1cross modal: 1face recognition: 1recognition: 1person verification: 1missing data: 1multi modal: 1attention: 1image fusion: 1cross lingual speech retrieval: 1image recognition: 1domain invariant representations: 1robust speech recognition: 1factorized hierarchical variational autoencoder: 1energy: 1low precision: 1on chip memory: 1field programmable gate arrays: 1floating point arithmetic: 1inference mechanisms: 1power consumption: 1quantisation (signal): 1energy conservation: 1costing: 1fpga: 1speaker identification: 1multitask learning: 1semantic embeddings: 1production engineering computing: 1food products: 1reranking: 1phonotactics: 1cnn: 1medical computing: 1conditional random field: 1patient treatment: 1mobile computing: 1finite state transducers: 1database management systems: 1markov processes: 1senone posteriors: 1i vector: 1deep neural networks (dnns): 1acoustic unit discovery (aud): 1language recognition: 1multilingual: 1data selection: 1dnn: 1natural languages: 1crf: 1extended recognition network (ern): 1decoding: 1error correction: 1dynamic time warping (dtw): 1computer assisted pronunciation training (capt): 1highway lstm: 1lstm: 1cntk: 1sequence training: 1
Most Publications2019: 682018: 602020: 382021: 362022: 32

Affiliations
Massachusetts Institute of Technology (MIT), CSAIL, Cambridge, MA, USA

ICASSP2022 Yuan Gong, Ziyi Chen, Iek-Heng Chu, Peng Chang 0002, James R. Glass
Transformer-Based Multi-Aspect Multi-Granularity Non-Native English Speaker Pronunciation Assessment.

ICASSP2022 Yuan Gong, Jin Yu, James R. Glass
Vocalsound: A Dataset for Improving Human Vocal Sounds Recognition.

ICASSP2022 Sameer Khurana, Antoine Laurent, James R. Glass
Magic Dust for Cross-Lingual Adaptation of Monolingual Wav2vec-2.0.

ICASSP2022 Cheng-I Jeff Lai, Erica Cooper, Yang Zhang 0001, Shiyu Chang, Kaizhi Qian, Yi-Lun Liao, Yung-Sung Chuang, Alexander H. Liu, Junichi Yamagishi, David D. Cox, James R. Glass
On the Interplay between Sparsity, Naturalness, Intelligibility, and Prosody in Speech Synthesis.

Interspeech2022 Alexander H. Liu, Cheng-I Lai, Wei-Ning Hsu, Michael Auli, Alexei Baevski, James R. Glass
Simple and Effective Unsupervised Speech Synthesis.

AAAI2022 Yuan Gong, Cheng-I Lai, Yu-An Chung, James R. Glass
SSAST: Self-Supervised Audio Spectrogram Transformer.

TASLP2021 Yuan Gong, Yu-An Chung, James R. Glass
PSLA: Improving Audio Tagging With Pretraining, Sampling, Labeling, and Aggregation.

ICASSP2021 Yu-An Chung, Yonatan Belinkov, James R. Glass
Similarity Analysis of Self-Supervised Speech Representations.

ICASSP2021 Cheng-I Lai, Yung-Sung Chuang, Hung-Yi Lee, Shang-Wen Li 0001, James R. Glass
Semi-Supervised Spoken Language Understanding via Self-Supervised Speech and Language Model Pretraining.

Interspeech2021 Yuan Gong, Yu-An Chung, James R. Glass
AST: Audio Spectrogram Transformer.

Interspeech2021 R'mani Haulcy, James R. Glass
CLAC: A Speech Corpus of Healthy English Speakers.

Interspeech2021 Alexander H. Liu, Yu-An Chung, James R. Glass
Non-Autoregressive Predictive Coding for Learning Speech Representations from Local Dependencies.

Interspeech2021 Hongyin Luo, James R. Glass, Garima Lalwani, Yi Zhang, Shang-Wen Li 0001, 
Joint Retrieval-Extraction Training for Evidence-Aware Dialog Response Selection.

Interspeech2021 Ian Palmer, Andrew Rouditchenko, Andrei Barbu, Boris Katz, James R. Glass
Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset.

Interspeech2021 Andrew Rouditchenko, Angie W. Boggust, David Harwath, Samuel Thomas 0001, Hilde Kuehne, Brian Chen, Rameswar Panda, Rogério Feris, Brian Kingsbury, Michael Picheny, James R. Glass
Cascaded Multilingual Audio-Visual Learning from Videos.

Interspeech2021 Andrew Rouditchenko, Angie W. Boggust, David Harwath, Brian Chen, Dhiraj Joshi, Samuel Thomas 0001, Kartik Audhkhasi, Hilde Kuehne, Rameswar Panda, Rogério Schmidt Feris, Brian Kingsbury, Michael Picheny, Antonio Torralba 0001, James R. Glass
AVLnet: Learning Audio-Visual Language Representations from Instructional Videos.

NeurIPS2021 Cheng-I Jeff Lai, Yang Zhang 0001, Alexander H. Liu, Shiyu Chang, Yi-Lun Liao, Yung-Sung Chuang, Kaizhi Qian, Sameer Khurana, David D. Cox, Jim Glass
PARP: Prune, Adjust and Re-Prune for Self-Supervised Speech Recognition.

ACL2021 Wei-Ning Hsu, David Harwath, Tyler Miller, Christopher Song, James R. Glass
Text-Free Image-to-Speech Synthesis Using Learned Segmental Units.

ICASSP2020 Jennifer Drexler, James R. Glass
Learning a Subword Inventory Jointly with End-to-End Automatic Speech Recognition.

ICASSP2020 Yasunori Ohishi, Akisato Kimura, Takahito Kawanishi, Kunio Kashino, David Harwath, James R. Glass
Trilingual Semantic Embeddings of Visually Grounded Speech with Self-Attention Mechanisms.

#16  | Prasanta Kumar Ghosh | Google Scholar   DBLP
VenuesInterspeech: 47ICASSP: 22TASLP: 3SpeechComm: 2
Years2022: 82021: 102020: 152019: 132018: 152017: 82016: 5
ISCA Sectionspeech signal characterization: 4show and tell: 3speech and voice disorders: 3human speech production: 3bioacoustics and articulation: 3articulatory information, modeling and inversion: 3speech production: 2speech signal analysis and representation: 2source and supra-segmentals: 2special session: 2low-resource asr development: 1speech production, perception and multimodality: 1assessment of pathological speech and language: 1cross/multi-lingual and code-switched asr: 1the first dicova challenge: 1diverse modes of speech acquisition and processing: 1speech in health: 1speaker recognition: 1applications in language learning and healthcare: 1deep enhancement: 1source separation and spatial analysis: 1voice conversion: 1speech and singing production: 1show and tell 6: 1multimodal paralinguistics: 1source separation and voice activity detection: 1speech and audio segmentation and classification: 1multimodal resources and annotation: 1dialogue systems and analysis of dialogue: 1speech enhancement and noise reduction: 1
IEEE Keywordspeech recognition: 10acoustic to articulatory inversion: 6speaker recognition: 5signal classification: 5whispered speech: 4amyotrophic lateral sclerosis: 4cepstral analysis: 4diseases: 4natural language processing: 3convolutional neural nets: 3blstm: 3filtering theory: 3parkinson’s disease: 2correlation methods: 2support vector machines: 2electromagnetic articulograph: 2audio signal processing: 2cnn: 2probability: 2gibbs sampling: 2neutral speech: 2deep neural networks: 2articulatory speech synthesis: 1speech synthesis: 1articulatory to acoustic forward mapping: 1recording device: 1dual attention pooling network: 1velum: 1biomedical mri: 13 dimensional convolutional neural network: 1convolution: 1image segmentation: 1image registration: 1air tissue boundary segmentation: 1medical image processing: 1real time magnetic resonance imaging video: 1tongue base: 1pitch: 1mel frequency cepstral coefficients: 1model complexity: 1noise: 1transfer learning: 1medical computing: 1x vectors: 1pitch drop: 1speaking rate: 1source filter interaction: 1natural languages: 1medical signal processing: 1recurrent neural nets: 1cnn lstm: 1maximum likelihood estimation: 1hidden markov models: 1lf mmi: 1adaptation: 1pseudo likelihood correction technique: 1attention network: 1acoustic signal detection: 1feature selection: 1biology computing: 1bioacoustics: 1acoustic analysis: 1cervical auscultation: 1swallow sound signal: 1gesture recognition: 1lstm: 1euler angles: 1head gestures: 1asthma: 1sustained phonations: 1opensmile: 1classification: 1exponential family distributions: 1time varying: 1latent variable model: 1non negative: 1nmf: 1dirichlet distribution: 1source separation: 1expectation maximisation algorithm: 1glottal inverse filtering: 1gif: 1probabilistic weighted linear prediction: 1formants: 1speaker verification: 1amplitude modulation: 1signal representation: 1articulatory data: 1automatic speech recognition: 1dynamic programming: 1database management systems: 1gaussian distribution: 1gci detection: 1computational complexity: 1bernoulli gaussian distribution: 1real time mri videos: 1image sequences: 1computer aided instruction: 1concatenative synthesis: 1articulatory video synthesis: 1video recording: 1spoken language training: 1video signal processing: 1pattern classification: 1electro magnetic articulography: 1support vector machine: 1articulatory kinematic features: 1health care: 1audio databases: 1gating network: 1speech enhancement: 1electromagnetic articulography: 1forward sub band selection: 1sonority based features: 1prominence measures: 1sonorous tcssbc (s tcssbc): 1syllable stress detection: 1subject independent inversion: 1broad class phonetic recognition: 1acoustic normalization: 1estimation theory: 1
Most Publications2019: 262021: 242018: 242020: 212022: 17

Affiliations
Indian Institute of Science, Department of Electrical Engineering, Bangalore, India
URLs

SpeechComm2022 Chiranjeevi Yarra, Prasanta Kumar Ghosh
Automatic syllable stress detection under non-parallel label and data condition.

ICASSP2022 Aravind Illa, Aanish Nair, Prasanta Kumar Ghosh
The impact of cross language on acoustic-to-articulatory inversion and its influence on articulatory speech synthesis.

ICASSP2022 Abinay Reddy Naini, Bhavuk Singhal, Prasanta Kumar Ghosh
Dual Attention Pooling Network for Recording Device Classification Using Neutral and Whispered Speech.

ICASSP2022 Anwesha Roy, Varun Belagali, Prasanta Kumar Ghosh
An Error Correction Scheme for Improved Air-Tissue Boundary in Real-Time MRI Video for Speech Production.

Interspeech2022 Anish Bhanushali, Grant Bridgman, Deekshitha G, Prasanta Kumar Ghosh, Pratik Kumar, Saurabh Kumar, Adithya Raj Kolladath, Nithya Ravi, Aaditeshwar Seth, Ashish Seth, Abhayjeet Singh, Vrunda N. Sukhadia, Srinivasan Umesh, Sathvik Udupa, Lodagala V. S. V. Durga Prasad, 
Gram Vaani ASR Challenge on spontaneous telephone speech recordings in regional variations of Hindi.

Interspeech2022 Anwesha Roy, Varun Belagali, Prasanta Kumar Ghosh
Air tissue boundary segmentation using regional loss in real-time Magnetic Resonance Imaging video for speech production.

Interspeech2022 C. Siddarth, Sathvik Udupa, Prasanta Kumar Ghosh
Watch Me Speak: 2D Visualization of Human Mouth during Speech.

Interspeech2022 Sathvik Udupa, Aravind Illa, Prasanta Kumar Ghosh
Streaming model for Acoustic to Articulatory Inversion with transformer networks.

ICASSP2021 Tanuka Bhattacharjee, Jhansi Mallela, Yamini Belur, Nalini Atchayarcmf, Ravi Yadav, Pradeep Reddy, Dipanjan Gope, Prasanta Kumar Ghosh
Effect of Noise and Model Complexity on Detection of Amyotrophic Lateral Sclerosis and Parkinson's Disease Using Pitch and MFCC.

ICASSP2021 Sarthak Kumar Maharana, Aravind Illa, Renuka Mannem, Yamini Belur, Preetie Shetty, Preethish-Kumar Veeramani, Seena Vengalil, Kiran Polavarapu, Atchayaram Nalini, Prasanta Kumar Ghosh
Acoustic-to-Articulatory Inversion for Dysarthric Speech by Using Cross-Corpus Acoustic-Articulatory Data.

ICASSP2021 Tilak Purohit, Achuth Rao M. V, Prasanta Kumar Ghosh
Impact of Speaking Rate on the Source Filter Interaction in Speech: A Study.

Interspeech2021 Tanuka Bhattacharjee, Jhansi Mallela, Yamini Belur, Atchayaram Nalini, Ravi Yadav, Pradeep Reddy, Dipanjan Gope, Prasanta Kumar Ghosh
Source and Vocal Tract Cues for Speech-Based Classification of Patients with Parkinson's Disease and Healthy Subjects.

Interspeech2021 Anuj Diwan, Rakesh Vaideeswaran, Sanket Shah, Ankita Singh, Srinivasa Raghavan K. M., Shreya Khare, Vinit Unni, Saurabh Vyas, Akash Rajpuria, Chiranjeevi Yarra, Ashish R. Mittal, Prasanta Kumar Ghosh, Preethi Jyothi, Kalika Bali, Vivek Seshadri, Sunayana Sitaram, Samarth Bharadwaj, Jai Nanavati, Raoul Nanavati, Karthik Sankaranarayanan, 
MUCS 2021: Multilingual and Code-Switching ASR Challenges for Low Resource Indian Languages.

Interspeech2021 Ananya Muguli, Lancelot Pinto, Nirmala R., Neeraj Kumar Sharma 0001, Prashant Krishnan V, Prasanta Kumar Ghosh, Rohit Kumar, Shrirama Bhat, Srikanth Raj Chetupalli, Sriram Ganapathy, Shreyas Ramoji, Viral Nanda, 
DiCOVA Challenge: Dataset, Task, and Baseline System for COVID-19 Diagnosis Using Acoustics.

Interspeech2021 Manthan Sharma, Navaneetha Gaddam, Tejas Umesh, Aditya Murthy, Prasanta Kumar Ghosh
A Comparative Study of Different EMG Features for Acoustics-to-EMG Mapping.

Interspeech2021 Sathvik Udupa, Anwesha Roy, Abhayjeet Singh, Aravind Illa, Prasanta Kumar Ghosh
Estimating Articulatory Movements in Speech Production with Transformer Networks.

Interspeech2021 Sathvik Udupa, Anwesha Roy, Abhayjeet Singh, Aravind Illa, Prasanta Kumar Ghosh
Web Interface for Estimating Articulatory Movements in Speech Production from Acoustics and Text.

Interspeech2021 Chiranjeevi Yarra, Prasanta Kumar Ghosh
Noise Robust Pitch Stylization Using Minimum Mean Absolute Error Criterion.

ICASSP2020 Jhansi Mallela, Aravind Illa, Suhas B. N., Sathvik Udupa, Yamini Belur, Atchayaram Nalini, Ravi Yadav, Pradeep Reddy, Dipanjan Gope, Prasanta Kumar Ghosh
Voice based classification of patients with Amyotrophic Lateral Sclerosis, Parkinson's Disease and Healthy Controls with CNN-LSTM using transfer learning.

ICASSP2020 Avni Rajpal, M. V. Achuth Rao, Chiranjeevi Yarra, Ritu Aggarwal, Prasanta Kumar Ghosh
Pseudo Likelihood Correction Technique for Low Resource Accented ASR.

#17  | Jianhua Tao | Google Scholar   DBLP
VenuesInterspeech: 45ICASSP: 15TASLP: 9SpeechComm: 4
Years2023: 12022: 82021: 152020: 182019: 122018: 92017: 52016: 5
ISCA Sectionspeech synthesis: 6voice conversion and adaptation: 3speech emotion recognition: 3speech coding and privacy: 2topics in asr: 2statistical parametric speech synthesis: 2asr: 1health and affect: 1privacy-preserving machine learning for audio & speech processing: 1search/decoding techniques and confidence measures for asr: 1computational resource constrained speech recognition: 1multi-channel audio and emotion recognition: 1speech enhancement: 1speech in multimodality: 1asr neural network architectures: 1speech in health: 1sequence-to-sequence speech recognition: 1spoken term detection, confidence measure, and end-to-end speech recognition: 1speech and audio source separation and scene analysis: 1emotion and personality in conversation: 1audio signal characterization: 1speech and voice disorders: 1nn architectures for asr: 1speech synthesis paradigms and methods: 1emotion recognition and analysis: 1deep enhancement: 1source separation and spatial analysis: 1prosody modeling and generation: 1music and audio processing: 1speech recognition: 1prosody and text processing: 1speech enhancement and noise reduction: 1source separation and spatial audio: 1
IEEE Keywordspeech recognition: 12speech synthesis: 8natural language processing: 6end to end: 6speaker recognition: 5transfer learning: 4text analysis: 3recurrent neural nets: 3attention: 3emotion recognition: 3low resource: 3speech coding: 2filtering theory: 2text based speech editing: 2text editing: 2optimisation: 2end to end model: 2decoding: 2autoregressive processes: 2multimodal fusion: 2self attention: 2speaker adaptation: 2matrix decomposition: 2bottleneck features: 2waveform generators: 1stochastic processes: 1vocoder: 1deterministic plus stochastic: 1multiband excitation: 1noise control: 1vocoders: 1coarse to fine decoding: 1mask prediction: 1text to speech: 1one shot learning: 1digital health: 1diseases: 1microorganisms: 1covid 19: 1regression analysis: 1depression: 1lstm: 1deep learning (artificial intelligence): 1global information embedding: 1behavioural sciences computing: 1mask and prediction: 1cross modal: 1bert: 1non autoregressive: 1fast: 1language modeling: 1teacher student learning: 1robust end to end speech recognition: 1speech distortion: 1speech enhancement: 1speech transformer: 1gated recurrent fusion: 1iterative methods: 1inverse problems: 1vocal tract: 1source filter model: 1arx lf model: 1glottal source: 1signal denoising: 1conversational transformer network (ctnet): 1context sensitive modeling: 1speaker sensitive modeling: 1signal classification: 1conversational emotion recognition: 1decoupled transformer: 1code switching: 1automatic speech recognition: 1bi level decoupling: 1prosody modeling: 1personalized speech synthesis: 1speaking style modeling: 1cross attention: 1speech emotion recognition: 1few shot speaker adaptation: 1prosody and voice factorization: 1the m2voc challenge: 1prosody transfer: 1optimization strategy: 1audio signal processing: 1audio visual systems: 1transformer: 1continuous emotion recognition: 1model level fusion: 1video signal processing: 1multi head attention: 1image fusion: 1adversarial training: 1cross lingual: 1speaker embedding: 1phoneme representation: 1speech embedding: 1word embedding: 1punctuation prediction: 1language invariant: 1adversarial: 1nonnegative matrix factorization (nmf): 1deep neural network (dnn): 1spectro temporal structures: 1speech separation: 1adversarial multilingual training: 1deep neural networks: 1blstm rnn: 1joint training: 1pitch estimation: 1feature mapping: 1image sequences: 1real time magnetic resonance imaging sequences: 1biomedical mri: 1boundary intensity map: 1real time systems: 1splines (mathematics): 1medical image processing: 1tongue contour extraction: 1
Most Publications2022: 492021: 482020: 422019: 382018: 26

Affiliations
URLs

SpeechComm2023 Jiangyan Yi, Jianhua Tao, Ye Bai, Zhengkun Tian, Cunhang Fan, 
Transfer knowledge for punctuation prediction via adversarial training.

SpeechComm2022 Wenhuan Lu, Xinyue Zhao, Na Guo, Yongwei Li, Jianguo Wei, Jianhua Tao, Jianwu Dang 0001, 
One-shot emotional voice conversion based on feature separation.

TASLP2022 Tao Wang 0074, Ruibo Fu, Jiangyan Yi, Jianhua Tao, Zhengqi Wen, 
NeuralDPS: Neural Deterministic Plus Stochastic Model With Multiband Excitation for Noise-Controllable Waveform Generation.

TASLP2022 Tao Wang 0074, Jiangyan Yi, Ruibo Fu, Jianhua Tao, Zhengqi Wen, 
CampNet: Context-Aware Mask Prediction for End-to-End Text-Based Speech Editing.

ICASSP2022 Cong Cai, Bin Liu 0041, Jianhua Tao, Zhengkun Tian, Jiahao Lu, Kexin Wang, 
End-to-End Network Based on Transformer for Automatic Detection of Covid-19.

ICASSP2022 Ya Li, Mingyue Niu, Ziping Zhao 0001, Jianhua Tao
Automatic Depression Level Assessment from Speech By Long-Term Global Information Embedding.

ICASSP2022 Tao Wang 0074, Jiangyan Yi, Liqun Deng, Ruibo Fu, Jianhua Tao, Zhengqi Wen, 
Context-Aware Mask Prediction Network for End-to-End Text-Based Speech Editing.

Interspeech2022 Shuai Zhang 0014, Jiangyan Yi, Zhengkun Tian, Jianhua Tao, Yu Ting Yeung, Liqun Deng, 
reducing multilingual context confusion for end-to-end code-switching automatic speech recognition.

Interspeech2022 Jiahui Pan, Shuai Nie, Hui Zhang 0031, Shulin He, Kanghao Zhang, Shan Liang, Xueliang Zhang 0001, Jianhua Tao
Speaker recognition-assisted robust audio deepfake detection.

SpeechComm2021 Shan Liang, Guanjun Li, Shuai Nie, Zhanlei Yang, Wenju Liu, Jianhua Tao
Exploiting the directional coherence function for multichannel source extraction.

TASLP2021 Ye Bai, Jiangyan Yi, Jianhua Tao, Zhengkun Tian, Zhengqi Wen, Shuai Zhang 0014, 
Fast End-to-End Speech Recognition Via Non-Autoregressive Models and Cross-Modal Knowledge Transferring From BERT.

TASLP2021 Ye Bai, Jiangyan Yi, Jianhua Tao, Zhengqi Wen, Zhengkun Tian, Shuai Zhang 0014, 
Integrating Knowledge Into End-to-End Speech Recognition From External Text-Only Data.

TASLP2021 Cunhang Fan, Jiangyan Yi, Jianhua Tao, Zhengkun Tian, Bin Liu 0041, Zhengqi Wen, 
Gated Recurrent Fusion With Joint Training Framework for Robust End-to-End Speech Recognition.

TASLP2021 Yongwei Li, Jianhua Tao, Donna Erickson, Bin Liu 0041, Masato Akagi, 
$F_0$-Noise-Robust Glottal Source and Vocal Tract Analysis Based on ARX-LF Model.

TASLP2021 Zheng Lian, Bin Liu 0041, Jianhua Tao
CTNet: Conversational Transformer Network for Emotion Recognition.

ICASSP2021 Shuai Zhang 0014, Jiangyan Yi, Zhengkun Tian, Ye Bai, Jianhua Tao, Zhengqi Wen, 
Decoupling Pronunciation and Language for End-to-End Code-Switching Automatic Speech Recognition.

ICASSP2021 Ruibo Fu, Jianhua Tao, Zhengqi Wen, Jiangyan Yi, Tao Wang 0074, Chunyu Qiang, 
Bi-Level Style and Prosody Decoupling Modeling for Personalized End-to-End Speech Synthesis.

ICASSP2021 Licai Sun, Bin Liu 0041, Jianhua Tao, Zheng Lian, 
Multimodal Cross- and Self-Attention Network for Speech Emotion Recognition.

ICASSP2021 Tao Wang 0074, Ruibo Fu, Jiangyan Yi, Jianhua Tao, Zhengqi Wen, Chunyu Qiang, Shiming Wang, 
Prosody and Voice Factorization for Few-Shot Speaker Adaptation in the Challenge M2voc 2021.

Interspeech2021 Shuai Zhang 0014, Jiangyan Yi, Zhengkun Tian, Ye Bai, Jianhua Tao, Xuefei Liu, Zhengqi Wen, 
End-to-End Spelling Correction Conditioned on Acoustic Feature for Code-Switching Speech Recognition.

#18  | Tomoki Toda | Google Scholar   DBLP
VenuesInterspeech: 37ICASSP: 21TASLP: 9SpeechComm: 3
Years2022: 112021: 122020: 102019: 82018: 82017: 102016: 11
ISCA Sectionspeech synthesis: 11voice conversion and adaptation: 3neural techniques for voice conversion and waveform generation: 3speech enhancement, bandwidth extension and hearing aids: 2voice conversion and speech synthesis: 2wavenet and novel paradigms: 2special session: 2the voicemos challenge: 1technology for disordered speech: 1the zero resource speech challenge 2020: 1neural waveform generation: 1novel paradigms for direct synthesis based on speech-related biosignals: 1sequence models for asr: 1speech synthesis paradigms and methods: 1speech analysis and representation: 1glottal source modeling: 1speech-enhancement: 1speech synthesis prosody: 1co-inference of production and acoustics: 1
IEEE Keywordspeech synthesis: 18vocoders: 8voice conversion: 6natural language processing: 5recurrent neural nets: 5speech recognition: 5autoregressive processes: 4speaker recognition: 4neural vocoder: 4gaussian processes: 4speech intelligibility: 3transformer: 3convolutional neural nets: 3probability: 3speech enhancement: 3speech coding: 3filtering theory: 3hidden markov models: 3mos prediction: 2sequence to sequence: 2pitch dependent dilated convolution: 2wavenet: 2vocoder: 2audio signal processing: 2noise: 2noise shaping: 2regression analysis: 2direct waveform modification: 2microphones: 2noise suppression: 2silent speech communication: 2external noise monitoring: 2mean opinion score: 1speech naturalness assessment: 1streaming: 1non autoregressive: 1hearing: 1speech quality assessment: 1self supervised learning: 1self supervised speech representation: 1computer based training: 1open source: 1noisy to noisy vc: 1voice conversion (vc): 1noisy speech modeling: 1signal denoising: 1pretraining: 1parallel wavegan: 1quasi periodic wavenet: 1pitch controllability: 1quasi periodic structure: 1emotion recognition: 1perceived emotion: 1listener adaptation: 1speech emotion recognition: 1language model: 1bert: 1text analysis: 1open source software: 1vector quantized variational autoencoder: 1nonparallel: 1dysarthria: 1text to speech: 1medical disorders: 1wavegrad: 1diffusion probabilistic vocoder: 1sub modeling: 1diffwave: 1long short term memory recurrent neural networks: 1customer services: 1customer satisfaction: 1call centres: 1customer satisfaction (cs): 1contact center call: 1hierarchical multi task model: 1supervised learning: 1self attention: 1sound event detection: 1weakly supervised learning: 1sequence to sequence model: 1weighted forced attention: 1forced alignment: 1prediction theory: 1shallow model: 1laplacian distribution: 1wavenet vocoder: 1multiple samples output: 1linear prediction: 1gaussian inverse autoregressive flow: 1fast fourier transforms: 1fftnet: 1parallel wavenet: 1oversmoothed parameters: 1wavenet fine tuning: 1cyclic recurrent neural network: 1entire audible frequency range: 1multirate signal processing: 1graphics processing units: 1subband wavenet: 1sampling methods: 1perceptual weighting: 1white noise: 1quantisation (signal): 1convolution: 1noise analysis: 1feedforward neural nets: 1language translation: 1emphasis estimation: 1word level emphasis: 1speech to speech translation: 1emphasis translation: 1intent: 1hidden semi markov model (hsmm): 1polyphonic sound event detection (sed): 1hybrid model: 1recurrent neural network: 1long short term memory (lstm): 1duration control: 1statistical inversion and production mappings: 1intercorrelation of articulators: 1gaussian mixture model: 1articulatory control: 1speech modification: 1tensors: 1non negative matrix factorization: 1nonaudible murmur: 1pattern classification: 1post filter: 1modulation spectrum: 1trees (mathematics): 1smoothing methods: 1gmm based voice conversion: 1clustergen: 1mixture models: 1global variance: 1oversmoothing: 1statistical parametric speech synthesis: 1spoofing attack: 1speaker verification: 1spectral analysis: 1statistical singing voice conversion: 1waveform analysis: 1spectral differential: 1f0 transformation: 1cross gender conversion: 1nonaudible murmur microphone: 1blind source separation: 1semi blind source separation: 1interference suppression: 1generative model: 1product of experts: 1f0 prediction: 1electrolaryngeal speech enhancement: 1
Most Publications2021: 432014: 432015: 362022: 332018: 32

Affiliations
URLs

SpeechComm2022 Takuma Okamoto, Keisuke Matsubara, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai, 
Neural speech-rate conversion with multispeaker WaveNet vocoder.

ICASSP2022 Erica Cooper, Wen-Chin Huang, Tomoki Toda, Junichi Yamagishi, 
Generalization Ability of MOS Prediction Networks.

ICASSP2022 Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda
An Investigation of Streaming Non-Autoregressive sequence-to-sequence Voice Conversion.

ICASSP2022 Wen-Chin Huang, Erica Cooper, Junichi Yamagishi, Tomoki Toda
LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech.

ICASSP2022 Wen-Chin Huang, Shu-Wen Yang, Tomoki Hayashi, Hung-Yi Lee, Shinji Watanabe 0001, Tomoki Toda
S3PRL-VC: Open-Source Voice Conversion Framework with Self-Supervised Speech Representations.

ICASSP2022 Chao Xie, Yi-Chiao Wu, Patrick Lumban Tobing, Wen-Chin Huang, Tomoki Toda
Direct Noisy Speech Modeling for Noisy-To-Noisy Voice Conversion.

Interspeech2022 Yeonjong Choi, Chao Xie, Tomoki Toda
An Evaluation of Three-Stage Voice Conversion Framework for Noisy and Reverberant Conditions.

Interspeech2022 Wen-Chin Huang, Erica Cooper, Yu Tsao 0001, Hsin-Min Wang, Tomoki Toda, Junichi Yamagishi, 
The VoiceMOS Challenge 2022.

Interspeech2022 Lester Phillip Violeta, Wen-Chin Huang, Tomoki Toda
Investigating Self-supervised Pretraining Frameworks for Pathological Speech Recognition.

Interspeech2022 Reo Yoneyama, Yi-Chiao Wu, Tomoki Toda
Unified Source-Filter GAN with Harmonic-plus-Noise Source Excitation Generation.

Interspeech2022 Daiki Yoshioka, Yusuke Yasuda, Noriyuki Matsunaga, Yamato Ohtani, Tomoki Toda
Spoken-Text-Style Transfer with Conditional Variational Autoencoder and Content Word Storage.

TASLP2021 Wen-Chin Huang, Tomoki Hayashi, Yi-Chiao Wu, Hirokazu Kameoka, Tomoki Toda
Pretraining Techniques for Sequence-to-Sequence Voice Conversion.

TASLP2021 Yi-Chiao Wu, Tomoki Hayashi, Takuma Okamoto, Hisashi Kawai, Tomoki Toda
Quasi-Periodic Parallel WaveGAN: A Non-Autoregressive Raw Waveform Generative Model With Pitch-Dependent Dilated Convolution Neural Network.

TASLP2021 Yi-Chiao Wu, Tomoki Hayashi, Patrick Lumban Tobing, Kazuhiro Kobayashi, Tomoki Toda
Quasi-Periodic WaveNet: An Autoregressive Raw Waveform Generative Model With Pitch-Dependent Dilated Convolution Neural Network.

ICASSP2021 Atsushi Ando, Ryo Masumura, Hiroshi Sato, Takafumi Moriya, Takanori Ashihara, Yusuke Ijima, Tomoki Toda
Speech Emotion Recognition Based on Listener Adaptive Models.

ICASSP2021 Wen-Chin Huang, Chia-Hua Wu, Shang-Bao Luo, Kuan-Yu Chen, Hsin-Min Wang, Tomoki Toda
Speech Recognition by Simply Fine-Tuning Bert.

ICASSP2021 Kazuhiro Kobayashi, Wen-Chin Huang, Yi-Chiao Wu, Patrick Lumban Tobing, Tomoki Hayashi, Tomoki Toda
Crank: An Open-Source Software for Nonparallel Voice Conversion Based on Vector-Quantized Variational Autoencoder.

ICASSP2021 Keisuke Matsubara, Takuma Okamoto, Ryoichi Takashima, Tetsuya Takiguchi, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai, 
High-Intelligibility Speech Synthesis for Dysarthric Speakers with LPCNet-Based TTS and CycleVAE-Based VC.

ICASSP2021 Takuma Okamoto, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai, 
Noise Level Limited Sub-Modeling for Diffusion Probabilistic Vocoders.

Interspeech2021 Wen-Chin Huang, Kazuhiro Kobayashi, Yu-Huai Peng, Ching-Feng Liu, Yu Tsao 0001, Hsin-Min Wang, Tomoki Toda
A Preliminary Study of a Two-Stage Paradigm for Preserving Speaker Identity in Dysarthric Voice Conversion.

#19  | Li-Rong Dai 0001 | Google Scholar   DBLP
VenuesInterspeech: 35ICASSP: 21TASLP: 10SpeechComm: 3AAAI: 1
Years2022: 92021: 92020: 102019: 112018: 132017: 72016: 11
ISCA Sectionspeech synthesis: 4novel models and training methods for asr: 2speaker and language recognition: 2speaker recognition: 2acoustic modeling with neural networks: 2multimodal speech emotion recognition and paralinguistics: 1multimodal systems: 1language and accent recognition: 1acoustic event detection and acoustic scene classification: 1openasr20 and low resource asr development: 1learning techniques for speaker recognition: 1voice conversion and adaptation: 1asr neural network architectures and training: 1acoustic event detection: 1corpus annotation and evaluation: 1speaker recognition and diarization: 1singing and multimodal synthesis: 1speaker verification using neural network methods: 1representation learning for emotion: 1voice conversion and speech synthesis: 1novel neural network architectures for acoustic modelling: 1speech synthesis paradigms and methods: 1neural network acoustic models for asr: 1language recognition: 1source separation and auditory scene analysis: 1special session: 1speech enhancement and noise reduction: 1speech analysis: 1
IEEE Keywordspeech recognition: 14speaker recognition: 10speech synthesis: 7hidden markov models: 5speaker verification: 4hidden markov model: 4cepstral analysis: 3supervised learning: 3sequence to sequence: 3convolutional neural nets: 3gaussian processes: 3voice conversion: 3audio signal processing: 3attention: 3speech separation: 3recurrent neural nets: 3feedforward neural nets: 3self supervised pre training: 2deep learning (artificial intelligence): 2probability: 2tacotron: 2autoregressive processes: 2unit selection: 2signal representation: 2voice activity detection: 2source separation: 2sound event detection: 2audio tagging: 2deep neural network: 2label permutation problem: 2vocoders: 2natural language processing: 2text analysis: 2speech enhancement: 2representation learning: 1anomalous sound detection: 1knowledge based systems: 1self supervised learning: 1supervised pre training: 1medical signal processing: 1binary classification: 1audio recording: 1covid 19: 1respiratory diagnosis: 1end to end: 1unsupervised domain adaptation: 1label smoothing: 1knowledge distillation: 1emotion recognition: 1convolutional neural network: 1signal reconstruction: 1style transformation: 1speech emotion recognition: 1disentanglement: 1speech representation: 1noise robustness: 1wav2vec2.0: 1sequence alignment: 1encoder decoder: 1post inference: 1inference mechanisms: 1end to end asr: 1multi granularity: 1text to speech: 1embedding learning: 1dense residual networks: 1model ensemble: 1adversarial training: 1sequence to sequence (seq2seq): 1disentangle: 1linguistics: 1matrix algebra: 1model adaptation: 1scaling: 1ctc: 1dilated convolution: 1attention mechanism: 1baum welch statistics: 1speaker identification: 1time domain: 1target tracking: 1time domain analysis: 1sparse encoder: 1semi supervised learning: 1weakly labeled: 1computational auditory scene analysis: 1mel spectrogram: 1signal classification: 1weakly labelled data: 1computational linguistics: 1text supervision: 1neural network: 1statistics: 1language identification deep neural network i vector lid senones: 1speech coding: 1speech bandwidth extension: 1recurrent neural networks: 1dilated convolutional neural networks: 1bottleneck features: 1neural net architecture: 1regression analysis: 1deep neural network (dnn): 1multiobjective ensembling: 1speech enhancement (se): 1compact and low latency design: 1multiobjective learning: 1speech intelligibility: 1progressive learning: 1long short term memory: 1post processing: 1dense structure: 1decoding: 1vocabulary: 1lfr: 1dfsmn: 1lvcsr: 1blstm: 1modelling: 1convolution: 1entropy: 1fsmn: 1gradient methods: 1gender mixture detection: 1speaker clustering: 1unsupervised speech separation: 1mixture models: 1gender issues: 1speaker dissimilarity measure: 1language modeling: 1cfsmn: 1encoding: 1time series: 1feedforward sequential memory networks: 1matrix decomposition: 1deep neural networks: 1estimation theory: 1channel adaptation: 1channel prior estimation: 1probabilistic linear discriminant analysis: 1multi source speaker verification: 1convolutional codes: 1what where auto encoder: 1spectral envelope: 1convolution neural network: 1short duration utterance: 1content aware local variability: 1deep belief network: 1postfilter: 1restricted boltzmann machine: 1filtering theory: 1belief networks: 1modulation: 1compensation: 1line spectral pair: 1modulation spectrum: 1
Most Publications2014: 342016: 332018: 302015: 272022: 25

Affiliations
University of Science and Technology of China, National Engineering Laboratory for Speech and Language Information Processing, Hefei, China
URLs

ICASSP2022 Han Chen, Yan Song 0001, Li-Rong Dai 0001, Ian McLoughlin 0001, Lin Liu, 
Self-Supervised Representation Learning for Unsupervised Anomalous Sound Detection Under Domain Shift.

ICASSP2022 Xing-Yu Chen, Qiu-Shi Zhu, Jie Zhang 0042, Li-Rong Dai 0001
Supervised and Self-Supervised Pretraining Based Covid-19 Detection Using Acoustic Breathing/Cough/Speech Signals.

ICASSP2022 Hang-Rui Hu, Yan Song 0001, Ying Liu, Li-Rong Dai 0001, Ian McLoughlin 0001, Lin Liu, 
Domain Robust Deep Embedding Learning for Speaker Recognition.

ICASSP2022 Yuxuan Xi, Yan Song 0001, Li-Rong Dai 0001, Ian McLoughlin 0001, Lin Liu, 
Frontend Attributes Disentanglement for Speech Emotion Recognition.

ICASSP2022 Qiu-Shi Zhu, Jie Zhang 0042, Zi-qiang Zhang, Ming-Hui Wu, Xin Fang, Li-Rong Dai 0001
A Noise-Robust Self-Supervised Pre-Training Model Based Speech Representation Learning for Automatic Speech Recognition.

Interspeech2022 Junyi Ao, Ziqiang Zhang, Long Zhou, Shujie Liu 0001, Haizhou Li 0001, Tom Ko, Lirong Dai 0001, Jinyu Li 0001, Yao Qian, Furu Wei, 
Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data.

Interspeech2022 Ye-Qian Du, Jie Zhang 0042, Qiu-Shi Zhu, Lirong Dai 0001, Ming-Hui Wu, Xin Fang, Zhou-Wang Yang, 
A Complementary Joint Training Approach Using Unpaired Speech and Text A Complementary Joint Training Approach Using Unpaired Speech and Text.

Interspeech2022 Hang-Rui Hu, Yan Song 0001, Li-Rong Dai 0001, Ian McLoughlin 0001, Lin Liu, 
Class-Aware Distribution Alignment based Unsupervised Domain Adaptation for Speaker Verification.

Interspeech2022 Hai-tao Xu, Jie Zhang, Li-Rong Dai 0001
Differential Time-frequency Log-mel Spectrogram Features for Vision Transformer Based Infant Cry Recognition.

TASLP2021 Jian Tang, Jie Zhang 0042, Yan Song 0001, Ian McLoughlin 0001, Li-Rong Dai 0001
Multi-Granularity Sequence Alignment Mapping for Encoder-Decoder Based End-to-End ASR.

TASLP2021 Xiao Zhou, Zhen-Hua Ling, Li-Rong Dai 0001
UnitNet: A Sequence-to-Sequence Acoustic Model for Concatenative Speech Synthesis.

ICASSP2021 Ying Liu, Yan Song 0001, Ian McLoughlin 0001, Lin Liu, Li-Rong Dai 0001
An Effective Deep Embedding Learning Method Based on Dense-Residual Networks for Speaker Verification.

Interspeech2021 Hang Chen, Jun Du, Yu Hu 0003, Li-Rong Dai 0001, Bao-Cai Yin, Chin-Hui Lee, 
Automatic Lip-Reading with Hierarchical Pyramidal Convolution and Self-Attention for Image Sequences with No Word Boundaries.

Interspeech2021 Hui Wang, Lin Liu, Yan Song 0001, Lei Fang, Ian McLoughlin 0001, Li-Rong Dai 0001
A Weight Moving Average Based Alternate Decoupled Learning Algorithm for Long-Tailed Language Identification.

Interspeech2021 Xu Zheng, Yan Song 0001, Li-Rong Dai 0001, Ian McLoughlin 0001, Lin Liu, 
An Effective Mutual Mean Teaching Based Domain Adaptation Method for Sound Event Detection.

Interspeech2021 Xiao Zhou, Zhen-Hua Ling, Li-Rong Dai 0001
UnitNet-Based Hybrid Speech Synthesis.

Interspeech2021 Qiu-Shi Zhu, Jie Zhang 0042, Ming-Hui Wu, Xin Fang, Li-Rong Dai 0001
An Improved Wav2Vec 2.0 Pre-Training Approach Using Enhanced Local Dependency Modeling for Speech Recognition.

AAAI2021 Jing-Xuan Zhang, Korin Richmond, Zhen-Hua Ling, Lirong Dai 0001
TaLNet: Voice Reconstruction from Tongue and Lip Articulation with Transfer Learning from Text-to-Speech Synthesis.

TASLP2020 Jing-Xuan Zhang, Zhen-Hua Ling, Li-Rong Dai 0001
Non-Parallel Sequence-to-Sequence Voice Conversion With Disentangled Linguistic and Speaker Representations.

ICASSP2020 Fenglin Ding, Wu Guo, Lirong Dai 0001, Jun Du, 
Attention-Based Gated Scaling Adaptive Acoustic Model for CTC-Based Speech Recognition.

#20  | Zhiyong Wu 0001 | Google Scholar   DBLP
VenuesICASSP: 31Interspeech: 31TASLP: 3AAAI: 2EMNLP: 1IJCAI: 1SpeechComm: 1
Years2022: 192021: 132020: 62019: 112018: 82017: 62016: 7
ISCA Sectionspeech synthesis: 10voice conversion and adaptation: 3spoken term detection: 2single-channel speech enhancement: 1embedding and network architecture for speaker recognition: 1non-autoregressive sequential modeling for speech processing: 1voice anti-spoofing and countermeasure: 1speech synthesis paradigms and methods: 1asr neural network architectures and training: 1new trends in self-supervised speech processing: 1neural techniques for voice conversion and waveform generation: 1emotion recognition and analysis: 1expressive speech synthesis: 1deep learning for source separation and pitch tracking: 1prosody and text processing: 1voice conversion: 1emotion modeling: 1behavioral signal processing and speaker state and traits analytics: 1spoken documents, spoken understanding and semantic analysis: 1
IEEE Keywordspeech recognition: 17speech synthesis: 16natural language processing: 13recurrent neural nets: 9speaker recognition: 8emotion recognition: 8speech emotion recognition: 7text analysis: 5speech coding: 5multi task learning: 3vocoders: 3mispronunciation detection and diagnosis: 3regression analysis: 3pattern classification: 2security of data: 2trees (mathematics): 2text to speech: 2transformer: 2expressive speech synthesis: 2speaking style: 2deep learning (artificial intelligence): 2optimisation: 2computer aided pronunciation training: 2decoding: 2voice conversion: 2entropy: 2ordinal regression: 2code switching: 2human computer interaction: 2convolutional neural nets: 2multilingual: 2bidirectional long short term memory (blstm): 2cross lingual: 2supervised learning: 1adversarial attacks: 1automatic speaker verification: 1adversarial defense: 1self supervised learning: 1computational linguistics: 1image segmentation: 1span based decoder: 1tree structure: 1character level: 1prosodic structure prediction: 1phonetic posteriorgrams: 1speech to animation: 1mixture of experts: 1computer animation: 1hierarchical: 1xlnet: 1knowledge distillation: 1speaking style modelling: 1graph neural network: 1conversational text to speech synthesis: 1bidirectional attention mechanism: 1matrix algebra: 1hidden markov models: 1forced alignment: 1end to end model: 1biometrics (access control): 1speaker verification: 1adversarial attack: 1vocoder: 1audio signal processing: 1uniform sampling: 1path dropout: 1neural architecture search: 1phoneme recognition: 1acoustic phonetic linguistic embeddings: 1hybrid bottleneck features: 1voice activity detection: 1disentangling: 1cross entropy: 1connectionist temporal classification: 1residual error: 1capsule: 1exemplary emotion descriptor: 1spatial information: 1recurrent: 1capsule network: 1sequential: 1expressive: 1emotion: 1global style token: 1autoregressive processes: 1non autoregressive: 1ctc: 1neural network based text to speech: 1syntactic parse tree traversal: 1word processing: 1grammars: 1prosody control: 1syntactic representation learning: 1pronunciation assessment: 1computer aided instruction: 1computer assisted language learning: 1goodness of pronunciation: 1linguistics: 1multi speaker and multi style tts: 1durian: 1hifi gan: 1low resource condition: 1speech intelligibility: 1phonetic pos teriorgrams: 1accented speech recognition: 1accent conversion: 1end to end: 1multilingual speech synthesis: 1foreign accent: 1center loss: 1spectral analysis: 1discriminative features: 1dilated residual network: 1multi head self attention: 1self attention: 1wavenet: 1blstm: 1phonetic posteriorgrams(ppgs): 1speech fluency assessment: 1anchored reference sample: 1mean opinion score (mos): 1computer assisted language learning (call): 1quasifully recurrent neural network (qrnn): 1variational inference: 1text to speech (tts) synthesis: 1parallel wavenet: 1convolutional neural network (cnn): 1parallel processing: 1capsule networks: 1spatial relationship information: 1recurrent connection: 1utterance level features: 1unsupervised clustering: 1extended phoneme set in l2 speech: 1mispronunciation patterns: 1phonemic posterior grams: 1feature representation: 1acoustic phonemic model: 1style adaptation: 1expressiveness: 1style feature: 1acoustic model: 1structured output layer: 1deep bidirectional long short term memory: 1emphasis detection: 1deep bidirectional long short term memory (dblstm): 1talking avatar: 1low level descriptors (lld): 1bottleneck feature: 1bidirectional recurrent neural network (brnn): 1gated recurrent unit (gru): 1question detection: 1low resource: 1
Most Publications2022: 472021: 322019: 182020: 152018: 14

Affiliations
Tsinghua University, Joint Research Center for Media Sciences, Beijing, China (PhD)
Chinese University of Hong Kong, Hong Kong

TASLP2022 Haibin Wu, Xu Li, Andy T. Liu, Zhiyong Wu 0001, Helen Meng, Hung-Yi Lee, 
Improving the Adversarial Robustness for Speaker Verification by Self-Supervised Learning.

ICASSP2022 Xueyuan Chen, Changhe Song, Yixuan Zhou, Zhiyong Wu 0001, Changbin Chen, Zhongqin Wu, Helen Meng, 
A Character-Level Span-Based Model for Mandarin Prosodic Structure Prediction.

ICASSP2022 Liyang Chen, Zhiyong Wu 0001, Jun Ling, Runnan Li, Xu Tan 0003, Sheng Zhao, 
Transformer-S2A: Robust and Efficient Speech-to-Animation.

ICASSP2022 Shun Lei, Yixuan Zhou, Liyang Chen, Zhiyong Wu 0001, Shiyin Kang, Helen Meng, 
Towards Expressive Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis.

ICASSP2022 Jingbei Li, Yi Meng, Chenyi Li, Zhiyong Wu 0001, Helen Meng, Chao Weng, Dan Su 0002, 
Enhancing Speaking Styles in Conversational Text-to-Speech Synthesis with Graph-Based Multi-Modal Context Modeling.

ICASSP2022 Jingbei Li, Yi Meng, Zhiyong Wu 0001, Helen Meng, Qiao Tian, Yuping Wang, Yuxuan Wang 0002, 
Neufa: Neural Network Based End-to-End Forced Alignment with Bidirectional Attention Mechanism.

ICASSP2022 Haibin Wu, Po-Chun Hsu, Ji Gao, Shanshan Zhang, Shen Huang, Jian Kang 0006, Zhiyong Wu 0001, Helen Meng, Hung-Yi Lee, 
Adversarial Sample Detection for Speaker Verification by Neural Vocoders.

ICASSP2022 Xixin Wu, Shoukang Hu, Zhiyong Wu 0001, Xunying Liu, Helen Meng, 
Neural Architecture Search for Speech Emotion Recognition.

ICASSP2022 Wenxuan Ye, Shaoguang Mao, Frank K. Soong, Wenshan Wu, Yan Xia 0005, Jonathan Tien, Zhiyong Wu 0001
An Approach to Mispronunciation Detection and Diagnosis with Acoustic, Phonetic and Linguistic (APL) Embeddings.

ICASSP2022 Xintao Zhao, Feng Liu, Changhe Song, Zhiyong Wu 0001, Shiyin Kang, Deyi Tuo, Helen Meng, 
Disentangling Content and Fine-Grained Prosody Information Via Hybrid ASR Bottleneck Features for Voice Conversion.

Interspeech2022 Jun Chen, Wei Rao, Zilin Wang, Zhiyong Wu 0001, Yannan Wang, Tao Yu, Shidong Shang, Helen Meng, 
Speech Enhancement with Fullband-Subband Cross-Attention Network.

Interspeech2022 Jie Chen, Changhe Song, Deyi Tuo, Xixin Wu, Shiyin Kang, Zhiyong Wu 0001, Helen Meng, 
Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information.

Interspeech2022 Shun Lei, Yixuan Zhou, Liyang Chen, Jiankun Hu, Zhiyong Wu 0001, Shiyin Kang, Helen Meng, 
Towards Multi-Scale Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis.

Interspeech2022 Xiang Li, Changhe Song, Xianhao Wei, Zhiyong Wu 0001, Jia Jia 0001, Helen Meng, 
Towards Cross-speaker Reading Style Transfer on Audiobook Dataset.

Interspeech2022 Yi Meng, Xiang Li, Zhiyong Wu 0001, Tingtian Li, Zixun Sun, Xinyu Xiao, Chi Sun, Hui Zhan, Helen Meng, 
CALM: Constrastive Cross-modal Speaking Style Modeling for Expressive Text-to-Speech Synthesis.

Interspeech2022 Sicheng Yang, Methawee Tantrawenith, Haolin Zhuang, Zhiyong Wu 0001, Aolan Sun, Jianzong Wang, Ning Cheng 0001, Huaizhen Tang, Xintao Zhao, Jie Wang, Helen Meng, 
Speech Representation Disentanglement with Adversarial Mutual Information Learning for One-shot Voice Conversion.

Interspeech2022 Yang Zhang, Zhiqiang Lv, Haibin Wu, Shanshan Zhang, Pengfei Hu, Zhiyong Wu 0001, Hung-yi Lee, Helen Meng, 
MFA-Conformer: Multi-scale Feature Aggregation Conformer for Automatic Speaker Verification.

Interspeech2022 Shaohuan Zhou, Shun Lei, Weiya You, Deyi Tuo, Yuren You, Zhiyong Wu 0001, Shiyin Kang, Helen Meng, 
Towards Improving the Expressiveness of Singing Voice Synthesis with BERT Derived Semantic Information.

Interspeech2022 Yixuan Zhou, Changhe Song, Xiang Li, Luwen Zhang, Zhiyong Wu 0001, Yanyao Bian, Dan Su 0002, Helen Meng, 
Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synthesis.

TASLP2021 Xixin Wu, Yuewen Cao, Hui Lu, Songxiang Liu, Shiyin Kang, Zhiyong Wu 0001, Xunying Liu, Helen Meng, 
Exemplar-Based Emotive Speech Synthesis.

#21  | Yu Tsao 0001 | Google Scholar   DBLP
VenuesInterspeech: 42ICASSP: 13TASLP: 9SpeechComm: 3NeurIPS: 1ICML: 1
Years2022: 172021: 92020: 92019: 122018: 82017: 72016: 7
ISCA Sectionspeech enhancement and intelligibility: 5speech enhancement: 5single-channel speech enhancement: 4dereverberation, noise reduction, and speaker extraction: 2voice conversion and adaptation: 2speech synthesis: 2neural techniques for voice conversion and waveform generation: 2voice conversion: 2special session: 2speech production, perception and multimodality: 1the voicemos challenge: 1source separation: 1speech intelligibility prediction for hearing-impaired listeners: 1speech coding and privacy: 1noise reduction and intelligibility: 1intelligibility-enhancing speech modification: 1model training for asr: 1speech and audio classification: 1speech intelligibility and quality: 1audio events and acoustic scenes: 1speech analysis and representation: 1speech-enhancement: 1discriminative training for asr: 1speech enhancement and noise reduction: 1language recognition: 1
IEEE Keywordspeech enhancement: 12speech recognition: 8speaker recognition: 4recurrent neural nets: 3signal denoising: 3deep neural network: 3pattern classification: 3automatic speech recognition: 3data compression: 2convolutional neural nets: 2reverberation: 2deep learning (artificial intelligence): 2unsupervised learning: 2audio signal processing: 2speaker verification: 2decoding: 2natural language processing: 2matrix decomposition: 2mean square error methods: 2optimisation: 2filtering theory: 2audio visual systems: 1asynchronous multimodal learning: 1audio visual: 1low quality data: 1data privacy: 1floating point integer arithmetic circuit: 1deep neural network model compression: 1speech dereverberation: 1floating point arithmetic: 1adders: 1inference acceleration: 1supervised learning: 1metricgan: 1unsupervised speech enhancement: 1medical signal processing: 1non invasive: 1sensor fusion: 1electromyography: 1multimodal: 1anti spoofing: 1partially fake audio detection: 1biometrics (access control): 1speech synthesis: 1security of data: 1audio deep synthesis detection challenge: 1and heterogeneous computing: 1quantum computing: 1text classification: 1temporal convolution: 1text analysis: 1quantum machine learning: 1spoken language understanding: 1generative model: 1bayes methods: 1affine transforms: 1discriminative model: 1joint bayesian model: 1optimal transport: 1unsupervised domain adaptation: 1spoken language identification: 1statistical distributions: 1speech separation: 1maml: 1anil: 1source separation: 1meta learning: 1gaussian processes: 1subspace based representation: 1phonotactic language recognition: 1support vector machines: 1subspace based learning: 1multichannel speech enhancement: 1raw waveform mapping: 1microphones: 1inner ear microphones: 1phase estimation: 1distributed microphones: 1fully convolutional network (fcn): 1deep neural networks: 1ensemble learning: 1decision trees: 1generalizability: 1dynamically sized decision tree: 1regression analysis: 1deep denoising autoencoder: 1signal classification: 1character error rate: 1reinforcement learning: 1end to end speech enhancement: 1speech intelligibility: 1fully convolutional neural network: 1raw waveform: 1statistics: 1data transmission efficiency: 1discrete wavelet transform: 1discrete wavelet transforms: 1feature compression: 1wireless channels: 1distributed speech recognition: 1diarization: 1overlap detection: 1noise: 1speaker diarization: 1densely connected progressive learning: 1multiple target learning: 1deep learning based speech enhancement: 1highly mismatch condition: 1recurrent neural network: 1social networking (online): 1social network: 1personalized language modeling: 1mobile computing: 1plda: 1autoencoders: 1discriminative training: 1postfiltering: 1locally linear embedding: 1spectral analysis: 1nonnegative matrix factorization (nmf): 1mandarin speech recognition: 1frequency lowering technology: 1
Most Publications2022: 662021: 552020: 442019: 432018: 37

Affiliations
Academia Sinica, Research Center for Information Technology Innovation, Taipei, Taiwan

TASLP2022 Shang-Yi Chuang, Hsin-Min Wang, Yu Tsao 0001
Improved Lite Audio-Visual Speech Enhancement.

TASLP2022 Yu-Chen Lin, Cheng Yu, Yi-Te Hsu, Szu-Wei Fu, Yu Tsao 0001, Tei-Wei Kuo, 
SEOFP-NET: Compression and Acceleration of Deep Neural Networks for Speech Enhancement Using Sign-Exponent-Only Floating-Points.

ICASSP2022 Szu-Wei Fu, Cheng Yu, Kuo-Hsuan Hung, Mirco Ravanelli, Yu Tsao 0001
MetricGAN-U: Unsupervised Speech Enhancement/ Dereverberation Based Only on Noisy/ Reverberated Speech.

ICASSP2022 Kuan-Chen Wang, Kai-Chun Liu, Hsin-Min Wang, Yu Tsao 0001
EMGSE: Acoustic/EMG Fusion for Multimodal Speech Enhancement.

ICASSP2022 Haibin Wu, Heng-Cheng Kuo, Naijun Zheng, Kuo-Hsuan Hung, Hung-Yi Lee, Yu Tsao 0001, Hsin-Min Wang, Helen Meng, 
Partially Fake Audio Detection by Self-Attention-Based Fake Span Discovery.

ICASSP2022 Chao-Han Huck Yang, Jun Qi 0002, Samuel Yen-Chi Chen, Yu Tsao 0001, Pin-Yu Chen, 
When BERT Meets Quantum Temporal Convolution Learning for Text Classification in Heterogeneous Computing.

Interspeech2022 Rong Chao, Cheng Yu, Szu-Wei Fu, Xugang Lu, Yu Tsao 0001
Perceptual Contrast Stretching on Target Feature for Speech Enhancement.

Interspeech2022 Yu-Wen Chen, Yu Tsao 0001
InQSS: a speech intelligibility and quality assessment model using a multi-task learning network.

Interspeech2022 Wen-Chin Huang, Erica Cooper, Yu Tsao 0001, Hsin-Min Wang, Tomoki Toda, Junichi Yamagishi, 
The VoiceMOS Challenge 2022.

Interspeech2022 Kuo-Hsuan Hung, Szu-Wei Fu, Huan-Hsin Tseng, Hsin-Tien Chiang, Yu Tsao 0001, Chii-Wann Lin, 
Boosting Self-Supervised Embeddings for Speech Enhancement.

Interspeech2022 Chi-Chang Lee, Cheng-Hung Hu, Yu-Chen Lin, Chu-Song Chen, Hsin-Min Wang, Yu Tsao 0001
NASTAR: Noise Adaptive Speech Enhancement with Target-Conditional Resampling.

Interspeech2022 Yen-Ju Lu, Xuankai Chang, Chenda Li, Wangyou Zhang, Samuele Cornell, Zhaoheng Ni, Yoshiki Masuyama, Brian Yan, Robin Scheibler, Zhong-Qiu Wang, Yu Tsao 0001, Yanmin Qian, Shinji Watanabe 0001, 
ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding.

Interspeech2022 Chiang-Jen Peng, Yun-Ju Chan, Yih-Liang Shen, Cheng Yu, Yu Tsao 0001, Tai-Shih Chi, 
Perceptual Characteristics Based Multi-objective Model for Speech Enhancement.

Interspeech2022 Fan-Lin Wang, Hung-Shin Lee, Yu Tsao 0001, Hsin-Min Wang, 
Disentangling the Impacts of Language and Channel Variability on Speech Separation Networks.

Interspeech2022 Cheng Yu, Szu-Wei Fu, Tsun-An Hsieh, Yu Tsao 0001, Mirco Ravanelli, 
OSSEM: one-shot speaker adaptive speech enhancement using meta learning.

Interspeech2022 Ryandhimas Edo Zezario, Fei Chen 0011, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao 0001
MBI-Net: A Non-Intrusive Multi-Branched Speech Intelligibility Prediction Model for Hearing Aids.

Interspeech2022 Ryandhimas Edo Zezario, Szu-Wei Fu, Fei Chen 0011, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao 0001
MTI-Net: A Multi-Target Speech Intelligibility Prediction Model.

TASLP2021 Xugang Lu, Peng Shen, Yu Tsao 0001, Hisashi Kawai, 
Coupling a Generative Model With a Discriminative Learning Framework for Speaker Verification.

ICASSP2021 Xugang Lu, Peng Shen, Yu Tsao 0001, Hisashi Kawai, 
Unsupervised Neural Adaptation Model Based on Optimal Transport for Spoken Language Identification.

ICASSP2021 Yuan-Kuei Wu, Kuan-Po Huang, Yu Tsao 0001, Hung-yi Lee, 
One Shot Learning for Speech Separation.

#22  | Tara N. Sainath | Google Scholar   DBLP
VenuesInterspeech: 41ICASSP: 25ICLR: 1TASLP: 1
Years2022: 142021: 102020: 82019: 102018: 92017: 92016: 8
ISCA Sectionneural network acoustic models for asr: 4asr technologies and systems: 2asr: 2multi-, cross-lingual and other topics in asr: 2cross-lingual and multilingual asr: 2asr neural network architectures: 2far-field speech recognition: 2far-field speech processing: 2feature extraction and acoustic modeling using neural networks for asr: 2search/decoding algorithms for asr: 1speech analysis: 1language modeling and lexical modeling for asr: 1speech representation: 1novel models and training methods for asr: 1resource-constrained asr: 1language and lexical modeling for asr: 1novel neural network architectures for asr: 1streaming for asr/rnn transducers: 1neural network training methods for asr: 1speech classification: 1lm adaptation, lexical units and punctuation: 1asr neural network architectures and training: 1spoken term detection, confidence measure, and end-to-end speech recognition: 1end-to-end speech recognition: 1acoustic model adaptation: 1recurrent neural models for asr: 1speech and audio segmentation and classification: 1discriminative training for asr: 1neural networks in speech recognition: 1vad and audio events: 1
IEEE Keywordspeech recognition: 24recurrent neural nets: 12natural language processing: 8speech coding: 7decoding: 4rnn t: 4conformer: 3sequence to sequence: 3text analysis: 2speaker recognition: 2asr: 2latency: 2voice activity detection: 2optimisation: 2probability: 2vocabulary: 2filtering theory: 2multilingual: 2speech enhancement: 2transducers: 1rnnt: 1two pass asr: 1long form asr: 1end to end asr: 1fusion: 1signal representation: 1bilinear pooling: 1gating: 1cascaded encoders: 1calibration: 1mean square error methods: 1attention based end to end models: 1automatic speech recognition: 1transformer: 1confidence: 1echo state network: 1echo: 1long form: 1regression analysis: 1endpointer: 1supervised learning: 1phonetics: 1biasing: 1attention: 1sequence to sequence models: 1unsupervised learning: 1semi supervised training: 1spelling correction: 1language model: 1attention models: 1mobile handsets: 1speech synthesis: 1end to end speech synthesis: 1end to end speech recognition: 1lstm: 1feedforward neural nets: 1cnn: 1deep neural network model: 1phase sensitive model spectral distortion model: 1far field speech recognition: 1phase distortion training: 1spectral distortion training: 1multi dialect: 1adaptation: 1computational linguistics: 1encoder decoder: 1seq2seq: 1indian: 1noise robust speech recognition: 1microphones: 1array signal processing: 1beamforming: 1spatial filters: 1direction of arrival estimation: 1channel bank filters: 1matrix decomposition: 1data compression: 1redundancy: 1acoustic convolution: 1
Most Publications2022: 452021: 242019: 242020: 202018: 20


ICASSP2022 Ke Hu, Tara N. Sainath, Arun Narayanan, Ruoming Pang, Trevor Strohman, 
Transducer-Based Streaming Deliberation for Cascaded Encoders.

ICASSP2022 Tara N. Sainath, Yanzhang He, Arun Narayanan, Rami Botros, Weiran Wang, David Qiu, Chung-Cheng Chiu, Rohit Prabhavalkar, Alexander Gruenstein, Anmol Gulati, Bo Li 0028, David Rybach, Emmanuel Guzman, Ian McGraw, James Qin, Krzysztof Choromanski, Qiao Liang, Robert David, Ruoming Pang, Shuo-Yiin Chang, Trevor Strohman, W. Ronny Huang, Wei Han 0002, Yonghui Wu, Yu Zhang 0033, 
Improving The Latency And Quality Of Cascaded Encoders.

ICASSP2022 Chao Zhang, Bo Li 0028, Zhiyun Lu, Tara N. Sainath, Shuo-Yiin Chang, 
Improving the Fusion of Acoustic and Text Representations in RNN-T.

Interspeech2022 Shuo-Yiin Chang, Bo Li 0028, Tara N. Sainath, Chao Zhang, Trevor Strohman, Qiao Liang, Yanzhang He, 
Turn-Taking Prediction for Natural Conversational Speech.

Interspeech2022 Shuo-Yiin Chang, Guru Prakash, Zelin Wu, Tara N. Sainath, Bo Li 0028, Qiao Liang, Adam Stambler, Shyam Upadhyay, Manaal Faruqui, Trevor Strohman, 
Streaming Intended Query Detection using E2E Modeling for Continued Conversation.

Interspeech2022 Shaojin Ding, Weiran Wang, Ding Zhao, Tara N. Sainath, Yanzhang He, Robert David, Rami Botros, Xin Wang 0016, Rina Panigrahy, Qiao Liang, Dongseong Hwang, Ian McGraw, Rohit Prabhavalkar, Trevor Strohman, 
A Unified Cascaded Encoder ASR Model for Dynamic Model Sizes.

Interspeech2022 Ke Hu, Tara N. Sainath, Yanzhang He, Rohit Prabhavalkar, Trevor Strohman, Sepand Mavandadi, Weiran Wang, 
Improving Deliberation by Text-Only and Semi-Supervised Training.

Interspeech2022 W. Ronny Huang, Shuo-Yiin Chang, David Rybach, Tara N. Sainath, Rohit Prabhavalkar, Cal Peyser, Zhiyun Lu, Cyril Allauzen, 
E2E Segmenter: Joint Segmenting and Decoding for Long-Form ASR.

Interspeech2022 W. Ronny Huang, Cal Peyser, Tara N. Sainath, Ruoming Pang, Trevor D. Strohman, Shankar Kumar, 
Sentence-Select: Large-Scale Language Model Data Selection for Rare-Word Speech Recognition.

Interspeech2022 Bo Li 0028, Tara N. Sainath, Ruoming Pang, Shuo-Yiin Chang, Qiumin Xu, Trevor Strohman, Vince Chen, Qiao Liang, Heguang Liu, Yanzhang He, Parisa Haghani, Sameer Bidichandani, 
A Language Agnostic Multilingual Streaming On-Device ASR System.

Interspeech2022 Cal Peyser, W. Ronny Huang, Andrew Rosenberg, Tara N. Sainath, Michael Picheny, Kyunghyun Cho, 
Towards Disentangled Speech Representations.

Interspeech2022 Weiran Wang, Tongzhou Chen, Tara N. Sainath, Ehsan Variani, Rohit Prabhavalkar, W. Ronny Huang, Bhuvana Ramabhadran, Neeraj Gaur, Sepand Mavandadi, Cal Peyser, Trevor Strohman, Yanzhang He, David Rybach, 
Improving Rare Word Recognition with LM-aware MWER Training.

Interspeech2022 Weiran Wang, Ke Hu, Tara N. Sainath
Streaming Align-Refine for Non-autoregressive Deliberation.

Interspeech2022 Chao Zhang, Bo Li 0028, Tara N. Sainath, Trevor Strohman, Sepand Mavandadi, Shuo-Yiin Chang, Parisa Haghani, 
Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification.

ICASSP2021 Bo Li 0028, Anmol Gulati, Jiahui Yu, Tara N. Sainath, Chung-Cheng Chiu, Arun Narayanan, Shuo-Yiin Chang, Ruoming Pang, Yanzhang He, James Qin, Wei Han 0002, Qiao Liang, Yu Zhang 0033, Trevor Strohman, Yonghui Wu, 
A Better and Faster end-to-end Model for Streaming ASR.

ICASSP2021 David Qiu, Qiujia Li, Yanzhang He, Yu Zhang 0033, Bo Li 0028, Liangliang Cao, Rohit Prabhavalkar, Deepti Bhatia, Wei Li 0133, Ke Hu, Tara N. Sainath, Ian McGraw, 
Learning Word-Level Confidence for Subword End-To-End ASR.

ICASSP2021 Harsh Shrivastava 0001, Ankush Garg, Yuan Cao 0007, Yu Zhang 0033, Tara N. Sainath
Echo State Speech Recognition.

ICASSP2021 Jiahui Yu, Chung-Cheng Chiu, Bo Li 0028, Shuo-Yiin Chang, Tara N. Sainath, Yanzhang He, Arun Narayanan, Wei Han 0002, Anmol Gulati, Yonghui Wu, Ruoming Pang, 
FastEmit: Low-Latency Streaming ASR with Sequence-Level Emission Regularization.

Interspeech2021 Rami Botros, Tara N. Sainath, Robert David, Emmanuel Guzman, Wei Li 0133, Yanzhang He, 
Tied & Reduced RNN-T Decoder.

Interspeech2021 W. Ronny Huang, Tara N. Sainath, Cal Peyser, Shankar Kumar, David Rybach, Trevor Strohman, 
Lookup-Table Recurrent Language Models for Long Tail Speech Recognition.

#23  | Mark Hasegawa-Johnson | Google Scholar   DBLP
VenuesInterspeech: 33ICASSP: 18TASLP: 7ICML: 4SpeechComm: 2ACL: 1NAACL: 1
Years2022: 102021: 122020: 112019: 72018: 112017: 102016: 5
ISCA Sectionspecial session: 3low resource speech recognition: 2spoken language modeling and understanding: 1atypical speech detection: 1cross/multi-lingual asr: 1speech synthesis: 1topics in asr: 1the first dicova challenge: 1noise reduction and intelligibility: 1applications of asr: 1cross/multi-lingual and code-switched speech recognition: 1spoken language understanding: 1speech translation and multilingual/multimodal learning: 1phonetic event detection and segmentation: 1diarization: 1speech and voice disorders: 1model adaptation for asr: 1speech in the brain: 1spoken term detection: 1extracting information from audio: 1adjusting to speaker, accent, and domain: 1topics in speech recognition: 1multimodal systems: 1deep neural networks: 1speaker state and trait: 1speech recognition: 1multi-lingual models and adaptation for asr: 1source separation and voice activity detection: 1speech-enhancement: 1multi-channel speech enhancement: 1
IEEE Keywordnatural language processing: 14speech recognition: 13automatic speech recognition: 5signal classification: 4speaker recognition: 4speech synthesis: 4ctc: 3voice conversion: 3decoding: 3unsupervised learning: 2medical signal processing: 2speaker adaptation: 2probability: 2transfer learning: 2recurrent neural nets: 2text analysis: 2image retrieval: 2machine translation: 2acoustic modeling: 2vocal expression: 2perception: 2multi task learning: 2under resourced languages: 2cross lingual adap tation: 1autosegmental phonology: 1ipa: 1under resourced asr: 1computer based training: 1tones: 1speech disentanglement: 1time frequency analysis: 1diseases: 1pneumodynamics: 1dicova ii: 1telemedicine: 1covid 19: 1audio signal processing: 1medical signal detection: 1patient diagnosis: 1speaker change detection: 1affine transforms: 1speaker segmentation: 1signal detection: 1counterfactual fairness: 1fairness in machine learning: 1multilingual: 1phonotactics: 1zero shot learning: 1asr: 1speech intelligibility: 1data augmentation: 1medical disorders: 1dysarthric speech: 1encoder decoder: 1sequence to sequence: 1image captioning: 1image to speech: 1child speech: 1paediatrics: 1speaker diarization: 1voice activity detection: 1language development: 1convolution: 1multiple instance learning: 1behavioural sciences computing: 1speech codecs: 1signal reconstruction: 1source separation: 1human computer interaction: 1image representation: 1language translation: 1unsupervised word discovery: 1multimodal learning: 1language acquisition: 1autoencoder: 1f0 conversion: 1wavenet vocoder: 1acoustic landmarks: 1end to end: 1smoothing methods: 1emotion recognition: 1latent semantic analysis: 1dimensional analysis: 1laughter: 1gaussian processes: 1phone recognition: 1mismatched transcription: 1hidden markov models: 1probabilistic transcription: 1mismatched machine transcription: 1crowdsourcing: 1zero resourced languages: 1modular system: 1automatic speech recognition (asr): 1low resource asr: 1bayes methods: 1acoustic unit discovery: 1bayesian model: 1informative prior: 1multi modal data: 1unwritten languages: 1unsupervised unit discovery: 1linguistics: 1multi accent speech recognition: 1─ end to end models: 1electroencephalography: 1speech coding: 1eeg: 1mismatched crowdsourcing: 1formal languages: 1grapheme to phoneme conversion: 1low resource languages: 1recurrent neural network models: 1unscripted speech: 1acoustic correlates: 1paralingual speech: 1oral histories: 1asr for under resourced languages: 1asr adaptation: 1mismatched transcriptions: 1nasal coda: 1pronunciation error detection: 1computer aided pronunciation training: 1landmark: 1error detection: 1
Most Publications2020: 292022: 272018: 222017: 222016: 20


SpeechComm2022 Heting Gao, Xiaoxuan Wang, Sunghun Kang, Rusty Mina, Dias Issa, John B. Harvill, Leda Sari, Mark Hasegawa-Johnson, Chang D. Yoo, 
Seamless equal accuracy ratio for inclusive CTC speech recognition.

TASLP2022 Jialu Li, Mark Hasegawa-Johnson
Autosegmental Neural Nets 2.0: An Extensive Study of Training Synchronous and Asynchronous Phones and Tones for Under-Resourced Tonal Languages.

ICASSP2022 Chak Ho Chan, Kaizhi Qian, Yang Zhang 0001, Mark Hasegawa-Johnson
SpeechSplit2.0: Unsupervised Speech Disentanglement for Voice Conversion without Tuning Autoencoder Bottlenecks.

ICASSP2022 John B. Harvill, Yash R. Wani, Moitreya Chatterjee, Mustafa Alam, David G. Beiser, David Chestek, Mark Hasegawa-Johnson, Narendra Ahuja, 
Detection of Covid-19 from Joint Time and Frequency Analysis of Speech, Breathing and Cough Audio.

Interspeech2022 Heting Gao, Junrui Ni, Kaizhi Qian, Yang Zhang 0001, Shiyu Chang, Mark Hasegawa-Johnson
WavPrompt: Towards Few-Shot Spoken Language Understanding with Frozen Language Models.

Interspeech2022 John B. Harvill, Mark Hasegawa-Johnson, Chang D. Yoo, 
Frame-Level Stutter Detection.

Interspeech2022 Mahir Morshed, Mark Hasegawa-Johnson
Cross-lingual articulatory feature information transfer for speech recognition using recurrent progressive neural networks.

Interspeech2022 Junrui Ni, Liming Wang, Heting Gao, Kaizhi Qian, Yang Zhang 0001, Shiyu Chang, Mark Hasegawa-Johnson
Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition.

ICML2022 Kaizhi Qian, Yang Zhang 0001, Heting Gao, Junrui Ni, Cheng-I Lai, David D. Cox, Mark Hasegawa-Johnson, Shiyu Chang, 
ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers.

ACL2022 Liming Wang, Siyuan Feng, Mark Hasegawa-Johnson, Chang Dong Yoo, 
Self-supervised Semantic-driven Phoneme Discovery for Zero-resource Speech Recognition.

SpeechComm2021 Jialu Li, Mark Hasegawa-Johnson, Nancy L. McElwain, 
Analysis of acoustic and voice quality features for the classification of infant and mother vocalizations.

TASLP2021 Leda Sari, Mark Hasegawa-Johnson, Samuel Thomas 0001, 
Auxiliary Networks for Joint Speaker Adaptation and Speaker Change Detection.

TASLP2021 Leda Sari, Mark Hasegawa-Johnson, Chang D. Yoo, 
Counterfactually Fair Automatic Speech Recognition.

ICASSP2021 Siyuan Feng 0001, Piotr Zelasko, Laureano Moro-Velázquez, Ali Abavisani, Mark Hasegawa-Johnson, Odette Scharenborg, Najim Dehak, 
How Phonotactics Affect Multilingual and Zero-Shot ASR Performance.

ICASSP2021 John B. Harvill, Dias Issa, Mark Hasegawa-Johnson, Chang Dong Yoo, 
Synthesis of New Words for Improved Dysarthric Speech Recognition on an Expanded Vocabulary.

ICASSP2021 Xinsheng Wang, Siyuan Feng 0001, Jihua Zhu, Mark Hasegawa-Johnson, Odette Scharenborg, 
Show and Speak: Directly Synthesize Spoken Description of Images.

ICASSP2021 Junzhe Zhu, Mark Hasegawa-Johnson, Nancy L. McElwain, 
A Comparison Study on Infant-Parent Voice Diarization.

ICASSP2021 Junzhe Zhu, Raymond A. Yeh, Mark Hasegawa-Johnson
Multi-Decoder Dprnn: Source Separation for Variable Number of Speakers.

Interspeech2021 Heting Gao, Junrui Ni, Yang Zhang 0001, Kaizhi Qian, Shiyu Chang, Mark Hasegawa-Johnson
Zero-Shot Cross-Lingual Phonetic Recognition with External Language Embedding.

Interspeech2021 John B. Harvill, Yash R. Wani, Mark Hasegawa-Johnson, Narendra Ahuja, David G. Beiser, David Chestek, 
Classification of COVID-19 from Cough Using Autoregressive Predictive Coding Pretraining and Spectral Data Augmentation.

#24  | Lukás Burget | Google Scholar   DBLP
VenuesInterspeech: 39ICASSP: 22TASLP: 4
Years2022: 92021: 92020: 72019: 122018: 72017: 122016: 9
ISCA Sectionspeaker recognition and diarization: 3speaker recognition: 3special session: 3embedding and network architecture for speaker recognition: 2large-scale evaluation of short-duration speaker verification: 2the voices from a distance challenge: 2speaker characterization and recognition: 2language recognition: 2self-supervised, semi-supervised, adaptation and data augmentation for asr: 1speaker embedding and diarization: 1search/decoding algorithms for asr: 1robust speaker recognition: 1language modeling and text-based innovations for asr: 1linguistic components in end-to-end asr: 1graph and end-to-end learning for speaker recognition: 1sequence-to-sequence speech recognition: 1zero-resource asr: 1the 2019 automatic speaker verification spoofing and countermeasures challenge: 1language modeling: 1the first dihard speech diarization challenge: 1topics in speech recognition: 1low resource speech recognition challenge for indian languages: 1speaker verification: 1neural networks for language modeling: 1multi-lingual models and adaptation for asr: 1neural network acoustic models for asr: 1spoken documents, spoken understanding and semantic analysis: 1robustness in speech processing: 1
IEEE Keywordspeaker recognition: 11bayes methods: 10speech recognition: 7acoustic unit discovery: 6natural language processing: 5variational bayes: 5hidden markov models: 5unsupervised learning: 4speaker diarization: 4pattern clustering: 4speaker verification: 3hmm: 3i vector: 3dnn: 3speech enhancement: 2speech synthesis: 2topic identification: 2optimisation: 2dihard: 2recurrent neural nets: 2discriminative training: 2bottleneck features: 2non parametric bayesian models: 1cross domain: 1blind source separation: 1speech separation: 1dpccn: 1mixture remix: 1frequency domain analysis: 1unsupervised target speech extraction: 1time domain analysis: 1multi channel: 1array signal processing: 1beamforming: 1dataset: 1multisv: 1sequence to sequence: 1self supervision: 1cycle consistency: 1voxconverse: 1voxsrc challenge: 1language translation: 1spoken language translation: 1coupled de coding: 1end to end differentiable pipeline: 1asr objective: 1joint training: 1auxiliary loss: 1transformers: 1how2 dataset: 1hierarchical subspace model: 1clustering: 1pattern classification: 1text analysis: 1gaussian distribution: 1embeddings: 1bayesian methods: 1on the fly data augmentation: 1speaker embedding: 1specaugment: 1convolutional neural nets: 1probability: 1linear discriminant analysis: 1x vector: 1chime: 1inference mechanisms: 1softmax margin: 1sequence learning: 1attention models: 1beam search training: 1i vectors: 1i vector extractor: 1domain adaptation: 1neural net architecture: 1entropy: 1robust asr: 1out of vocabulary words: 1decoding: 1vocabulary: 1low resource asr: 1bayesian model: 1informative prior: 1signal representation: 1text dependent speaker verification: 1cepstral analysis: 1hidden markov model (hmm): 1residual memory networks: 1lstm: 1rnn: 1automatic speech recognition: 1computational complexity: 1feedforward neural nets: 1convergence: 1bayesian approach: 1joint sequence models: 1smoothing methods: 1weighted finite state transducers: 1hierarchical pitman yor process: 1grapheme tophoneme conversion: 1expectation maximisation algorithm: 1letter to sound: 1nonparametric statistics: 1non parametric bayesian models: 1document handling: 1audio signal processing: 1unsupervised linear discriminant analysis: 1evaluation methods: 1zero resource: 1bayesian non parametric: 1gaussian processes: 1automatic speaker identification: 1mixture models: 1deep neural networks: 1microphones: 1de reverberation: 1denoising: 1ssnn: 1sequence summary: 1adaptation: 1
Most Publications2019: 292018: 262022: 252020: 192021: 18


TASLP2022 Lucas Ondel, Bolaji Yusuf, Lukás Burget, Murat Saraçlar, 
Non-Parametric Bayesian Subspace Models for Acoustic Unit Discovery.

ICASSP2022 Jiangyu Han, Yanhua Long, Lukás Burget, Jan Cernocký, 
DPCCN: Densely-Connected Pyramid Complex Convolutional Network for Robust Speech Separation and Extraction.

ICASSP2022 Ladislav Mosner, Oldrich Plchot, Lukás Burget, Jan Honza Cernocký, 
Multisv: Dataset for Far-Field Multi-Channel Speaker Verification.

Interspeech2022 Murali Karthick Baskar, Tim Herzig, Diana Nguyen, Mireia Díez, Tim Polzehl, Lukás Burget, Jan Cernocký, 
Speaker adaptation for Wav2vec2 based dysarthric ASR.

Interspeech2022 Niko Brummer, Albert Swart, Ladislav Mosner, Anna Silnova, Oldrich Plchot, Themos Stafylakis, Lukás Burget
Probabilistic Spherical Discriminant Analysis: An Alternative to PLDA for length-normalized embeddings.

Interspeech2022 Martin Kocour, Katerina Zmolíková, Lucas Ondel, Jan Svec, Marc Delcroix, Tsubasa Ochiai, Lukás Burget, Jan Cernocký, 
Revisiting joint decoding based multi-talker speech recognition with DNN acoustic model.

Interspeech2022 Federico Landini, Alicia Lozano-Diez, Mireia Díez, Lukás Burget
From Simulated Mixtures to Simulated Conversations as Training Data for End-to-End Neural Diarization.

Interspeech2022 Junyi Peng, Rongzhi Gu, Ladislav Mosner, Oldrich Plchot, Lukás Burget, Jan Cernocký, 
Learnable Sparse Filterbank for Speaker Verification.

Interspeech2022 Themos Stafylakis, Ladislav Mosner, Oldrich Plchot, Johan Rohdin, Anna Silnova, Lukás Burget, Jan Cernocký, 
Training speaker embedding extractors using multi-speaker audio with unknown speaker boundaries.

ICASSP2021 Murali Karthick Baskar, Lukás Burget, Shinji Watanabe 0001, Ramón Fernandez Astudillo, Jan Honza Cernocký, 
Eat: Enhanced ASR-TTS for Self-Supervised Speech Recognition.

ICASSP2021 Federico Landini, Ondrej Glembek, Pavel Matejka, Johan Rohdin, Lukás Burget, Mireia Díez, Anna Silnova, 
Analysis of the but Diarization System for Voxconverse Challenge.

ICASSP2021 Hari Krishna Vydana, Martin Karafiát, Katerina Zmolíková, Lukás Burget, Honza Cernocký, 
Jointly Trained Transformers Models for Spoken Language Translation.

ICASSP2021 Bolaji Yusuf, Lucas Ondel, Lukás Burget, Jan Cernocký, Murat Saraçlar, 
A Hierarchical Subspace Model for Language-Attuned Acoustic Unit Discovery.

Interspeech2021 Karel Benes, Lukás Burget
Text Augmentation for Language Models in High Error Recognition Scenario.

Interspeech2021 Ekaterina Egorova, Hari Krishna Vydana, Lukás Burget, Jan Cernocký, 
Out-of-Vocabulary Words Detection with Attention and CTC Alignments in an End-to-End ASR System.

Interspeech2021 Junyi Peng, Xiaoyang Qu, Rongzhi Gu, Jianzong Wang, Jing Xiao 0006, Lukás Burget, Jan Cernocký, 
Effective Phase Encoding for End-To-End Speaker Verification.

Interspeech2021 Junyi Peng, Xiaoyang Qu, Jianzong Wang, Rongzhi Gu, Jing Xiao 0006, Lukás Burget, Jan Cernocký, 
ICSpk: Interpretable Complex Speaker Embedding Extractor from Raw Waveform.

Interspeech2021 Themos Stafylakis, Johan Rohdin, Lukás Burget
Speaker Embeddings by Modeling Channel-Wise Correlations.

TASLP2020 Mireia Díez, Lukás Burget, Federico Landini, Jan Cernocký, 
Analysis of Speaker Diarization Based on Bayesian HMM With Eigenvoice Priors.

TASLP2020 Santosh Kesiraju, Oldrich Plchot, Lukás Burget, Suryakanth V. Gangashetty, 
Learning Document Embeddings Along With Their Uncertainties.

#25  | Najim Dehak | Google Scholar   DBLP
VenuesInterspeech: 45ICASSP: 17TASLP: 2
Years2022: 62021: 122020: 122019: 122018: 122017: 62016: 4
ISCA Sectiontrustworthy speech processing: 3robust speaker recognition: 2speaker recognition and diarization: 2non-autoregressive sequential modeling for speech processing: 2the attacker’s perpective on automatic speaker verification: 2speaker recognition evaluation: 2speaker verification: 2speaker state and trait: 2language recognition: 2self supervision and anti-spoofing: 1voice activity detection and keyword spotting: 1the adresso challenge: 1embedding and network architecture for speaker recognition: 1voice anti-spoofing and countermeasure: 1lm adaptation, lexical units and punctuation: 1the zero resource speech challenge 2020: 1speaker embedding: 1alzheimer’s dementia recognition through spontaneous speech: 1phonetic event detection and segmentation: 1spoken term detection: 1speaker recognition and anti-spoofing: 1the 2019 automatic speaker verification spoofing and countermeasures challenge: 1speech and voice disorders: 1representation learning of emotion and paralinguistics: 1the voices from a distance challenge: 1nn architectures for asr: 1language identification: 1representation learning for emotion: 1deep neural networks: 1the first dihard speech diarization challenge: 1extracting information from audio: 1topics in speech recognition: 1pathological speech and language: 1speaker recognition: 1special session: 1
IEEE Keywordspeaker recognition: 10speech recognition: 7emotion recognition: 3speaker verification: 3supervised learning: 2automatic speech recognition: 2unsupervised learning: 2perceptual loss: 2natural language processing: 2audio signal processing: 2signal denoising: 2transfer learning: 2probability: 2feature enhancement: 2speech enhancement: 2medical signal processing: 2diseases: 2acoustic unit discovery: 2understanding: 1object detection: 1connectionist temporal classification: 1regularization: 1attention: 1speech synthesis: 1text to speech: 1decoding: 1multilingual: 1phonotactics: 1zero shot learning: 1multi task learning: 1self supervised features: 1speech denoising: 1deep learning (artificial intelligence): 1signal classification: 1pre trained networks: 1copypaste: 1data augmentation: 1x vector: 1septoplasty: 1surgery: 1tonsillectomy: 1automatic speaker recognition: 1sinus surgery: 1deep feature loss: 1speech: 1parkinson’s disease: 1medical disorders: 1i vectors: 1x vectors: 1patient diagnosis: 1neurophysiology: 1linear discriminant analysis: 1data handling: 1dereverberation: 1far field adaptation: 1cyclegan: 1channel bank filters: 1x vector: 1pre trained: 1cold fusion: 1storage management: 1deep fusion: 1sequence to sequence: 1shallow fusion: 1language model: 1automatic speech recognition (asr): 1anti spoofing: 1asvspoof: 1automatic speaker verification: 1spoofing attack: 1filtering theory: 1security of data: 1replay attacks: 1generative adversarial neural networks (gans): 1unsupervised domain adaptation: 1cycle gans: 1microphones: 1regression analysis: 1mean square error methods: 1gaussian processes: 1lstm: 1rnn: 1uncertainty estimation: 1age issues: 1age estimation: 1speaker diarization: 1far field speech: 1nonparametric statistics: 1non parametric bayesian models: 1bayes methods: 1document handling: 1topic identification: 1unsupervised linear discriminant analysis: 1evaluation methods: 1zero resource: 1parkinson's disease: 1gcca: 1multi view learning: 1handwriting processing: 1frenchay dysarthria assessment: 1updrs: 1gait processing: 1patient treatment: 1senone posteriors: 1i vector: 1deep neural networks (dnns): 1acoustic unit discovery (aud): 1language recognition: 1bottleneck features: 1
Most Publications2021: 302018: 302020: 292019: 282022: 21


Interspeech2022 Jaejin Cho, Raghavendra Pappagari, Piotr Zelasko, Laureano Moro-Velázquez, Jesús Villalba 0001, Najim Dehak
Non-contrastive self-supervised learning of utterance-level speech representations.

Interspeech2022 Sonal Joshi, Saurabh Kataria, Yiwen Shao, Piotr Zelasko, Jesús Villalba 0001, Sanjeev Khudanpur, Najim Dehak
Defense against Adversarial Attacks on Hybrid Speech Recognition System using Adversarial Fine-tuning with Denoiser.

Interspeech2022 Sonal Joshi, Saurabh Kataria, Jesús Villalba 0001, Najim Dehak
AdvEst: Adversarial Perturbation Estimation to Classify and Detect Adversarial Attacks against Speaker Identification.

Interspeech2022 Saurabh Kataria, Jesús Villalba 0001, Laureano Moro-Velázquez, Najim Dehak
Joint domain adaptation and speech bandwidth extension using time-domain GANs for speaker verification.

Interspeech2022 Magdalena Rybicka, Jesús Villalba 0001, Najim Dehak, Konrad Kowalczyk, 
End-to-End Neural Speaker Diarization with an Iterative Refinement of Non-Autoregressive Attention-based Attractors.

Interspeech2022 Yiwen Shao, Jesús Villalba 0001, Sonal Joshi, Saurabh Kataria, Sanjeev Khudanpur, Najim Dehak
Chunking Defense for Adversarial Attacks on ASR.

ICASSP2021 Nanxin Chen, Piotr Zelasko, Jesús Villalba 0001, Najim Dehak
Focus on the Present: A Regularization Method for the ASR Source-Target Attention Layer.

ICASSP2021 Jaejin Cho, Piotr Zelasko, Jesús Villalba 0001, Najim Dehak
Improving Reconstruction Loss Based Speaker Embedding in Unsupervised and Semi-Supervised Scenarios.

ICASSP2021 Siyuan Feng 0001, Piotr Zelasko, Laureano Moro-Velázquez, Ali Abavisani, Mark Hasegawa-Johnson, Odette Scharenborg, Najim Dehak
How Phonotactics Affect Multilingual and Zero-Shot ASR Performance.

ICASSP2021 Saurabh Kataria, Jesús Villalba 0001, Najim Dehak
Perceptual Loss Based Speech Denoising with an Ensemble of Audio Pattern Recognition and Self-Supervised Models.

ICASSP2021 Raghavendra Pappagari, Jesús Villalba 0001, Piotr Zelasko, Laureano Moro-Velázquez, Najim Dehak
CopyPaste: An Augmentation Method for Speech Emotion Recognition.

Interspeech2021 Saurabhchand Bhati, Jesús Villalba 0001, Piotr Zelasko, Laureano Moro-Velázquez, Najim Dehak
Segmental Contrastive Predictive Coding for Unsupervised Word Segmentation.

Interspeech2021 Nanxin Chen, Piotr Zelasko, Laureano Moro-Velázquez, Jesús Villalba 0001, Najim Dehak
Align-Denoise: Single-Pass Non-Autoregressive Speech Recognition.

Interspeech2021 Nanxin Chen, Yu Zhang 0033, Heiga Zen, Ron J. Weiss, Mohammad Norouzi 0002, Najim Dehak, William Chan, 
WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis.

Interspeech2021 Saurabh Kataria, Jesús Villalba 0001, Piotr Zelasko, Laureano Moro-Velázquez, Najim Dehak
Deep Feature CycleGANs: Speaker Identity Preserving Non-Parallel Microphone-Telephone Domain Adaptation for Speaker Verification.

Interspeech2021 Raghavendra Pappagari, Jaejin Cho, Sonal Joshi, Laureano Moro-Velázquez, Piotr Zelasko, Jesús Villalba 0001, Najim Dehak
Automatic Detection and Assessment of Alzheimer Disease Using Speech and Language Technologies in Low-Resource Scenarios.

Interspeech2021 Magdalena Rybicka, Jesús Villalba 0001, Piotr Zelasko, Najim Dehak, Konrad Kowalczyk, 
Spine2Net: SpineNet with Res2Net and Time-Squeeze-and-Excitation Blocks for Speaker Recognition.

Interspeech2021 Jesús Villalba 0001, Sonal Joshi, Piotr Zelasko, Najim Dehak
Representation Learning to Classify and Detect Adversarial Attacks Against Speaker and Speech Recognition Systems.

TASLP2020 Laureano Moro-Velázquez, Estefanía Hernández-García, Jorge Andrés Gómez García, Juan Ignacio Godino-Llorente, Najim Dehak
Analysis of the Effects of Supraglottal Tract Surgical Procedures in Automatic Speaker Recognition Performance.

ICASSP2020 Saurabh Kataria, Phani Sankar Nidadavolu, Jesús Villalba 0001, Nanxin Chen, L. Paola García-Perera, Najim Dehak
Feature Enhancement with Deep Feature Losses for Speaker Verification.

#26  | Marc Delcroix | Google Scholar   DBLP
VenuesInterspeech: 38ICASSP: 24TASLP: 2
Years2022: 102021: 132020: 122019: 92018: 72017: 92016: 4
ISCA Sectionsource separation: 3adjusting to speaker, accent, and domain: 2acoustic models for asr: 2dereverberation, noise reduction, and speaker extraction: 1speech enhancement and intelligibility: 1speaker embedding and diarization: 1search/decoding algorithms for asr: 1novel models and training methods for asr: 1single-channel speech enhancement: 1speaker diarization: 1streaming for asr/rnn transducers: 1source separation, dereverberation and echo cancellation: 1speech localization, enhancement, and quality assessment: 1target speaker detection, localization and separation: 1monaural source separation: 1asr neural network architectures and training: 1diarization: 1targeted source separation: 1lm adaptation, lexical units and punctuation: 1asr for noisy and far-field speech: 1asr neural network architectures: 1speech and audio source separation and scene analysis: 1neural networks for language modeling: 1distant asr: 1end-to-end speech recognition: 1source separation and auditory scene analysis: 1far-field speech recognition: 1speech-enhancement: 1noise robust and far-field asr: 1multi-channel speech enhancement: 1acoustic model adaptation: 1speech enhancement and noise reduction: 1far-field, robustness and adaptation: 1robustness in speech processing: 1
IEEE Keywordspeech recognition: 19speaker recognition: 10speech enhancement: 9source separation: 6natural language processing: 5neural network: 5recurrent neural nets: 3blind source separation: 3array signal processing: 3reverberation: 3text analysis: 2sensor fusion: 2automatic speech recognition: 2gaussian processes: 2mixture models: 2speech extraction: 2speech separation: 2convolution: 2convolutional neural nets: 2online processing: 2dynamic stream weights: 2audio signal processing: 2beamforming: 2target speech extraction: 2time domain network: 2robust asr: 2time domain analysis: 2backpropagation: 2hidden markov models: 2joint training: 2adaptation: 2auxiliary feature: 2speaker extraction: 2language translation: 1speech summarization: 1attention fusion: 1speech translation: 1rover: 1infinite gmm: 1bayes methods: 1diarization: 1pattern clustering: 1recurrent neural network transducer: 1end to end: 1attention based decoder: 1noise robust speech recognition: 1speakerbeam: 1deep learning (artificial intelligence): 1input switching: 1complex backpropagation: 1transfer functions: 1signal to distortion ratio: 1multi channel source separation: 1acoustic beamforming: 1meeting recognition: 1speaker activity: 1continuous speech separation: 1long recording speech separation: 1dual path modeling: 1transforms: 1estimation theory: 1imbalanced datasets: 1confidence estimation: 1auxiliary features: 1bidirectional long short term memory (blstm): 1end to end (e2e) speech recognition: 1audio visual systems: 1audiovisual speaker localization: 1data fusion: 1video signal processing: 1image fusion: 1optimisation: 1maximum likelihood estimation: 1dereverberation: 1filtering theory: 1microphone array: 1spatial features: 1multi task loss: 1microphone arrays: 1single channel speech enhancement: 1signal denoising: 1multi task learning: 1auxiliary information: 1and multi head self attention: 1time domain: 1frequency domain analysis: 1multi speaker speech recognition: 1computational complexity: 1end to end speech recognition: 1tracking: 1backprop kalman filter: 1audiovisual speaker tracking: 1kalman filters: 1adversarial learning: 1speaker embedding: 1phoneme invariant feature: 1text independent speaker recognition: 1signal classification: 1deep neural networks: 1domain adaptation: 1topic model: 1recurrent neural network language model: 1sequence summary network: 1decoding: 1encoder decoder: 1semi supervised learning: 1autoencoder: 1encoding: 1speech synthesis: 1source counting: 1meeting diarization: 1speech separation/extraction: 1speaker attention: 1acoustic modeling: 1adaptive training: 1deep neural network: 1acoustic model adaptation: 1feedforward neural nets: 1speech mixtures: 1spatial filters: 1speaker adaptive neural network: 1context adaptation: 1spatial diffuseness features: 1cnn based acoustic model: 1environmental robustness: 1inverse problems: 1conditional density: 1model based feature enhancement: 1mixture density network: 1
Most Publications2021: 382020: 292022: 272017: 242019: 16

Affiliations
URLs

ICASSP2022 Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Shinji Watanabe 0001, 
Integrating Multiple ASR Systems into NLP Backend with Attention Fusion.

ICASSP2022 Keisuke Kinoshita, Marc Delcroix, Tomoharu Iwata, 
Tight Integration Of Neural- And Clustering-Based Diarization Through Deep Unfolding Of Infinite Gaussian Mixture Model.

ICASSP2022 Takafumi Moriya, Takanori Ashihara, Atsushi Ando, Hiroshi Sato, Tomohiro Tanaka, Kohei Matsuura, Ryo Masumura, Marc Delcroix, Takahiro Shinozaki, 
Hybrid RNN-T/Attention-Based Streaming ASR with Triggered Chunkwise Attention and Dual Internal Language Model Integration.

ICASSP2022 Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Naoyuki Kamo, Takafumi Moriya, 
Learning to Enhance or Not: Neural Network-Based Switching of Enhanced and Observed Signals for Overlapping Speech Recognition.

Interspeech2022 Marc Delcroix, Keisuke Kinoshita, Tsubasa Ochiai, Katerina Zmolíková, Hiroshi Sato, Tomohiro Nakatani, 
Listen only to me! How well can target speech extraction handle false alarms?

Interspeech2022 Kazuma Iwamoto, Tsubasa Ochiai, Marc Delcroix, Rintaro Ikeshita, Hiroshi Sato, Shoko Araki, Shigeru Katagiri, 
How bad are artifacts?: Analyzing the impact of speech enhancement errors on ASR.

Interspeech2022 Keisuke Kinoshita, Thilo von Neumann, Marc Delcroix, Christoph Böddeker, Reinhold Haeb-Umbach, 
Utterance-by-utterance overlap-aware neural diarization with Graph-PIT.

Interspeech2022 Martin Kocour, Katerina Zmolíková, Lucas Ondel, Jan Svec, Marc Delcroix, Tsubasa Ochiai, Lukás Burget, Jan Cernocký, 
Revisiting joint decoding based multi-talker speech recognition with DNN acoustic model.

Interspeech2022 Takafumi Moriya, Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Takahiro Shinozaki, 
Streaming Target-Speaker ASR with Neural Transducer.

Interspeech2022 Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Takafumi Moriya, Naoki Makishima, Mana Ihori, Tomohiro Tanaka, Ryo Masumura, 
Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations.

ICASSP2021 Christoph Böddeker, Wangyou Zhang, Tomohiro Nakatani, Keisuke Kinoshita, Tsubasa Ochiai, Marc Delcroix, Naoyuki Kamo, Yanmin Qian, Reinhold Haeb-Umbach, 
Convolutive Transfer Function Invariant SDR Training Criteria for Multi-Channel Reverberant Speech Separation.

ICASSP2021 Marc Delcroix, Katerina Zmolíková, Tsubasa Ochiai, Keisuke Kinoshita, Tomohiro Nakatani, 
Speaker Activity Driven Neural Speech Extraction.

ICASSP2021 Chenda Li, Zhuo Chen 0006, Yi Luo 0004, Cong Han, Tianyan Zhou, Keisuke Kinoshita, Marc Delcroix, Shinji Watanabe 0001, Yanmin Qian, 
Dual-Path Modeling for Long Recording Speech Separation in Meetings.

ICASSP2021 Atsunori Ogawa, Naohiro Tawara, Takatomo Kano, Marc Delcroix
BLSTM-Based Confidence Estimation for End-to-End Speech Recognition.

ICASSP2021 Julio Wissing, Benedikt T. Boenninghoff, Dorothea Kolossa, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Shoko Araki, Christopher Schymura, 
Data Fusion for Audiovisual Speaker Localization: Extending Dynamic Stream Weights to the Spatial Domain.

Interspeech2021 Marc Delcroix, Jorge Bennasar Vázquez, Tsubasa Ochiai, Keisuke Kinoshita, Shoko Araki, 
Few-Shot Learning of New Sound Classes for Target Sound Extraction.

Interspeech2021 Cong Han, Yi Luo 0004, Chenda Li, Tianyan Zhou, Keisuke Kinoshita, Shinji Watanabe 0001, Marc Delcroix, Hakan Erdogan, John R. Hershey, Nima Mesgarani, Zhuo Chen 0006, 
Continuous Speech Separation Using Speaker Inventory for Long Recording.

Interspeech2021 Keisuke Kinoshita, Marc Delcroix, Naohiro Tawara, 
Advances in Integration of End-to-End Neural and Clustering-Based Diarization for Real Conversational Speech.

Interspeech2021 Takafumi Moriya, Tomohiro Tanaka, Takanori Ashihara, Tsubasa Ochiai, Hiroshi Sato, Atsushi Ando, Ryo Masumura, Marc Delcroix, Taichi Asami, 
Streaming End-to-End Speech Recognition for Hybrid RNN-T/Attention Architecture.

Interspeech2021 Thilo von Neumann, Keisuke Kinoshita, Christoph Böddeker, Marc Delcroix, Reinhold Haeb-Umbach, 
Graph-PIT: Generalized Permutation Invariant Training for Continuous Separation of Arbitrary Numbers of Speakers.

#27  | Sanjeev Khudanpur | Google Scholar   DBLP
VenuesInterspeech: 40ICASSP: 22TASLP: 1
Years2022: 42021: 92020: 62019: 102018: 182017: 102016: 6
ISCA Sectionspeaker recognition evaluation: 3trustworthy speech processing: 2the voices from a distance challenge: 2acoustic models for asr: 2speaker and language recognition: 1tools, corpora and resources: 1language and accent recognition: 1source separation: 1graph and end-to-end learning for speaker recognition: 1linguistic components in end-to-end asr: 1feature extraction and distant asr: 1lm adaptation, lexical units and punctuation: 1neural networks for language modeling: 1asr neural network architectures and training: 1summarization, semantic analysis and classification: 1nn architectures for asr: 1spoken language processing for children’s speech: 1speaker recognition and diarization: 1recurrent neural models for asr: 1novel neural network architectures for acoustic modelling: 1robust speech recognition: 1speaker state and trait: 1end-to-end speech recognition: 1language modeling: 1acoustic modelling: 1the first dihard speech diarization challenge: 1extracting information from audio: 1search, computational strategies and language modeling: 1topic spotting, entity extraction and semantic analysis: 1speaker recognition: 1spoken term detection: 1lexical and pronunciation modeling: 1acoustic modeling with neural networks: 1far-field speech processing: 1topics in speech recognition: 1
IEEE Keywordspeech recognition: 16automatic speech recognition: 8natural language processing: 6transformer: 3speaker recognition: 3acoustic unit discovery: 3recurrent neural nets: 3speech enhancement: 2speech separation: 2decoding: 2lattice rescoring: 2speaker diarization: 2x vectors: 2deep neural networks: 2probability: 2lattice free mmi: 2sequence training: 2bayes methods: 2lstm: 2fourier transforms: 1channel bank filters: 1self supervised learning: 1speech coding: 1lattice pruning: 1lattice generation: 1decoder: 1parallel computation: 1neural language models: 1parallel processing: 1deep learning (artificial intelligence): 1noisy speech: 1source separation: 1signal denoising: 1convolutional neural nets: 1voice activity detection: 1lf mmi: 1streaming: 1computational complexity: 1wake word detection: 1gradient methods: 1language model adaptation: 1linear interpolation: 1neural language model: 1interpolation: 1merging: 1acoustic modeling: 1kaldi: 1array signal processing: 1chime 5 challenge: 1robust speech recognition: 1microphone arrays: 1hidden markov models: 1flat start: 1lattice free: 1maximum mutual information: 1single stage: 1far field speech: 1semi supervised training: 1low resource asr: 1bayesian model: 1informative prior: 1asr: 1neural network: 1attention: 1diarization: 1overlap detection: 1data augmentation: 1recurrent neural network language model: 1approximation theory: 1heuristic search: 1importance sampling: 1vocabulary: 1recurrent neural networks: 1language modeling: 1nonparametric statistics: 1non parametric bayesian models: 1document handling: 1topic identification: 1audio signal processing: 1unsupervised learning: 1augmentation: 1room impulse responses: 1deep neural network: 1reverberation: 1unsupervised linear discriminant analysis: 1evaluation methods: 1zero resource: 1point process model: 1query processing: 1keyword search: 1signal detection: 1asr for under resourced languages: 1asr adaptation: 1mismatched transcriptions: 1highway lstm: 1cntk: 1
Most Publications2018: 302021: 262019: 212020: 192017: 18

Affiliations
URLs

ICASSP2022 Zili Huang, Shinji Watanabe 0001, Shu-Wen Yang, Paola García, Sanjeev Khudanpur
Investigating Self-Supervised Learning for Speech Enhancement and Separation.

Interspeech2022 Sonal Joshi, Saurabh Kataria, Yiwen Shao, Piotr Zelasko, Jesús Villalba 0001, Sanjeev Khudanpur, Najim Dehak, 
Defense against Adversarial Attacks on Hybrid Speech Recognition System using Adversarial Fine-tuning with Denoiser.

Interspeech2022 Hexin Liu, Leibny Paola García-Perera, Andy W. H. Khong, Suzy J. Styles, Sanjeev Khudanpur
PHO-LID: A Unified Model Incorporating Acoustic-Phonetic and Phonotactic Information for Language Identification.

Interspeech2022 Yiwen Shao, Jesús Villalba 0001, Sonal Joshi, Saurabh Kataria, Sanjeev Khudanpur, Najim Dehak, 
Chunking Defense for Adversarial Attacks on ASR.

ICASSP2021 Hang Lv 0001, Zhehuai Chen, Hainan Xu, Daniel Povey, Lei Xie 0001, Sanjeev Khudanpur
An Asynchronous WFST-Based Decoder for Automatic Speech Recognition.

ICASSP2021 Ke Li 0018, Daniel Povey, Sanjeev Khudanpur
A Parallelizable Lattice Rescoring Strategy with Neural Language Models.

ICASSP2021 Matthew Maciejewski, Jing Shi 0003, Shinji Watanabe 0001, Sanjeev Khudanpur
Training Noisy Single-Channel Speech Separation with Noisy Oracle Sources: A Large Gap and a Small Step.

ICASSP2021 Yiming Wang, Hang Lv 0001, Daniel Povey, Lei Xie 0001, Sanjeev Khudanpur
Wake Word Detection with Streaming Transformers.

Interspeech2021 Guoguo Chen, Shuzhou Chai, Guan-Bo Wang, Jiayu Du, Wei-Qiang Zhang, Chao Weng, Dan Su 0002, Daniel Povey, Jan Trmal, Junbo Zhang, Mingjie Jin, Sanjeev Khudanpur, Shinji Watanabe 0001, Shuaijiang Zhao, Wei Zou, Xiangang Li, Xuchen Yao, Yongqing Wang, Zhao You, Zhiyong Yan, 
GigaSpeech: An Evolving, Multi-Domain ASR Corpus with 10, 000 Hours of Transcribed Audio.

Interspeech2021 Hexin Liu, Leibny Paola García-Perera, Xinyi Zhang, Justin Dauwels, Andy W. H. Khong, Sanjeev Khudanpur, Suzy J. Styles, 
End-to-End Language Diarization for Bilingual Code-Switching Speech.

Interspeech2021 Matthew Maciejewski, Shinji Watanabe 0001, Sanjeev Khudanpur
Speaker Verification-Based Evaluation of Single-Channel Speech Separation.

Interspeech2021 Desh Raj, Sanjeev Khudanpur
Reformulating DOVER-Lap Label Mapping as a Graph Partitioning Problem.

Interspeech2021 Matthew Wiesner, Mousmita Sarma, Ashish Arora, Desh Raj, Dongji Gao, Ruizhe Huang, Supreet Preet, Moris Johnson, Zikra Iqbal, Nagendra Goel, Jan Trmal, Leibny Paola García-Perera, Sanjeev Khudanpur
Training Hybrid Models on Noisy Transliterated Transcripts for Code-Switched Speech Recognition.

ICASSP2020 Ke Li 0018, Zhe Liu 0011, Tianxing He, Hongzhao Huang, Fuchun Peng, Daniel Povey, Sanjeev Khudanpur
An Empirical Study of Transformer-Based Neural Language Model Adaptation.

Interspeech2020 Pegah Ghahramani, Hossein Hadian, Daniel Povey, Hynek Hermansky, Sanjeev Khudanpur
An Alternative to MFCCs for ASR.

Interspeech2020 Ruizhe Huang, Ke Li 0018, Ashish Arora, Daniel Povey, Sanjeev Khudanpur
Efficient MDI Adaptation for n-Gram Language Models.

Interspeech2020 Ke Li 0018, Daniel Povey, Sanjeev Khudanpur
Neural Language Modeling with Implicit Cache Pointers.

Interspeech2020 Yiwen Shao, Yiming Wang, Daniel Povey, Sanjeev Khudanpur
PyChain: A Fully Parallelized PyTorch Implementation of LF-MMI for End-to-End ASR.

Interspeech2020 Yiming Wang, Hang Lv 0001, Daniel Povey, Lei Xie 0001, Sanjeev Khudanpur
Wake Word Detection with Alignment-Free Lattice-Free MMI.

ICASSP2019 Vimal Manohar, Szu-Jui Chen, Zhiqi Wang, Yusuke Fujita, Shinji Watanabe 0001, Sanjeev Khudanpur
Acoustic Modeling for Overlapping Speech Recognition: Jhu Chime-5 Challenge System.

#28  | Kai Yu 0004 | Google Scholar   DBLP
VenuesICASSP: 22Interspeech: 21TASLP: 15SpeechComm: 3EMNLP: 1
Years2022: 112021: 62020: 132019: 102018: 92017: 62016: 7
ISCA Sectionspeech synthesis: 2speaker recognition: 2speaker embedding and diarization: 1language and lexical modeling for asr: 1voice activity detection and keyword spotting: 1phonetic event detection and segmentation: 1spoken language understanding: 1anti-spoofing and liveness detection: 1spoken term detection, confidence measure, and end-to-end speech recognition: 1speaker recognition and anti-spoofing: 1the 2019 automatic speaker verification spoofing and countermeasures challenge: 1speaker verification using neural network methods: 1acoustic modelling: 1prosody and text processing: 1short utterances speaker recognition: 1search, computational strategies and language modeling: 1decoding, system combination: 1dialogue systems and analysis of dialogue: 1spoken term detection: 1
IEEE Keywordspeech recognition: 13natural language processing: 12speaker recognition: 11text analysis: 6speech synthesis: 5recurrent neural nets: 5audio signal processing: 4gaussian processes: 3convolutional neural networks: 3speaker verification: 3feedforward neural nets: 3decoding: 2mixture models: 2lattice to sequence: 2unsupervised learning: 2adversarial training: 2teacher student learning: 2hidden markov models: 2data handling: 2i vector: 2text dependent speaker verification: 2data augmentation: 2video signal processing: 2semi supervised learning: 2slot filling: 2speaker embedding: 2multitask learning: 2interactive systems: 2attention models: 2speech coding: 2signal classification: 2end to end: 2convolution: 2spoofing detection: 2robust speech recognition: 2spoken language understanding: 2probability: 1fastspeech2: 1speech codecs: 1voice cloning: 1autoregressive processes: 1unit selection: 1mixture density network: 1prosody modelling: 1prosody cloning: 1pre trained language model: 1algebra: 1lattice to lattice: 1word level prosody: 1decision trees: 1prosody tagging: 1prosody control: 1supervised learning: 1category adaptation: 1speech enhancement: 1weakly supervised learning: 1source separation: 1deep neural networks: 1cross modal: 1information retrieval: 1audio text retrieval: 1pre trained model: 1aggregation: 1text prompt: 1training detection criteria: 1pattern classification: 1arbitrary wake word: 1entropy: 1streaming: 1wake word detection: 1conditional generation: 1audio captioning: 1diverse caption generation: 1voice activity detection: 1speech activity detection. weakly supervised learning: 1teacher training: 1text to audio grounding: 1sound event detection: 1dataset: 1music: 1natural languages: 1variational auto encoder: 1text independent speaker verification: 1generative adversarial network: 1storage management: 1data compression: 1quantisation (signal): 1neural network language model: 1product quantization: 1binarization: 1natural language understanding (nlu): 1intent detection: 1dual learning: 1domain adaptation: 1natural language understanding: 1prior knowledge: 1label embedding: 1on the fly data augmentation: 1specaugment: 1convolutional neural nets: 1channel information: 1variational autoencoder: 1low resource: 1hierarchical: 1data sparsity: 1dialogue state tracking: 1polysemy: 1word processing: 1multi sense embeddings: 1language modeling: 1distributed representation: 1word lattice: 1search problems: 1forward backward algorithm: 1dialogue policy: 1ontologies (artificial intelligence): 1transfer learning: 1graph theory: 1graph neural networks: 1deep reinforcement learning: 1policy adaptation: 1multi agent systems: 1speaker neural embedding: 1triplet loss: 1angular softmax: 1center loss: 1short duration text independent speaker verification: 1knowledge distillation: 1computer aided instruction: 1language translation: 1audio databases: 1audio caption: 1recurrent neural networks: 1natural language generation: 1security of data: 1convolutional neural network: 1factor aware training: 1cluster adaptive training: 1residual learning: 1reverberation: 1sequences: 1sequence to sequence learning: 1question and answer: 1chatbot: 1short text conversation (stc): 1adversarial task discriminator: 1speech intelligibility: 1dilated convolution: 1co channel speaker identification: 1focal loss: 1adversarial adaptation: 1asr error robustness: 1signal representation: 1lattice: 1decoder: 1lvcsr: 1kws: 1wfst: 1dlss: 1lattice theory: 1ctc: 1cldnn: 1btas2016: 1acoustic modeling: 1very deep cnns: 1matrix algebra: 1pattern clustering: 1interpolation: 1robustness: 1vad: 1signal detection: 1multi task learning: 1speaker adaptation: 1deep neural network: 1noise robustness: 1
Most Publications2020: 492022: 332021: 312019: 302018: 27

Affiliations
Shanghai Jiao Tong University, Computer Science and Engineering Department, China
Cambridge University, Engineering Department, UK (PhD 2006)

SpeechComm2022 Bo Chen, Zhihang Xu, Kai Yu 0004
Data augmentation based non-parallel voice conversion with frame-level speaker disentangler.

TASLP2022 Bo Chen, Chenpeng Du, Kai Yu 0004
Neural Fusion for Voice Cloning.

TASLP2022 Chenpeng Du, Kai Yu 0004
Phone-Level Prosody Modelling With GMM-Based MDN for Diverse and Controllable Speech Synthesis.

ICASSP2022 Lingfeng Dai, Lu Chen 0002, Zhikai Zhou, Kai Yu 0004
LatticeBART: Lattice-to-Lattice Pre-Training for Speech Recognition.

ICASSP2022 Yiwei Guo, Chenpeng Du, Kai Yu 0004
Unsupervised Word-Level Prosody Tagging for Controllable Speech Synthesis.

ICASSP2022 Guangwei Li, Xuenan Xu, Heinrich Dinkel, Mengyue Wu, Kai Yu 0004
Category-Adapted Sound Event Enhancement with Weakly Labeled Data.

ICASSP2022 Siyu Lou, Xuenan Xu, Mengyue Wu, Kai Yu 0004
Audio-Text Retrieval in Context.

ICASSP2022 Yu Xi, Tian Tan 0002, Wangyou Zhang, Baochen Yang, Kai Yu 0004
Text Adaptive Detection for Customizable Keyword Spotting.

ICASSP2022 Xuenan Xu, Mengyue Wu, Kai Yu 0004
Diversity-Controllable and Accurate Audio Captioning Based on Neural Condition.

Interspeech2022 Chenpeng Du, Yiwei Guo, Xie Chen 0001, Kai Yu 0004
VQTTS: High-Fidelity Text-to-Speech Synthesis with Self-Supervised VQ Acoustic Feature.

Interspeech2022 Tao Liu, Shuai Fan 0005, Xu Xiang, Hongbo Song, Shaoxiong Lin, Jiaqi Sun, Tianyuan Han, Siyuan Chen, Binwei Yao, Sen Liu, Yifei Wu, Yanmin Qian, Kai Yu 0004
MSDWild: Multi-modal Speaker Diarization Dataset in the Wild.

TASLP2021 Heinrich Dinkel, Shuai Wang 0016, Xuenan Xu, Mengyue Wu, Kai Yu 0004
Voice Activity Detection in the Wild: A Data-Driven Approach Using Teacher-Student Training.

ICASSP2021 Chenpeng Du, Bing Han, Shuai Wang 0016, Yanmin Qian, Kai Yu 0004
SynAug: Synthesis-Based Data Augmentation for Text-Dependent Speaker Verification.

ICASSP2021 Xuenan Xu, Heinrich Dinkel, Mengyue Wu, Kai Yu 0004
Text-to-Audio Grounding: Building Correspondence Between Captions and Sound Events.

Interspeech2021 Lingfeng Dai, Qi Liu, Kai Yu 0004
Class-Based Neural Network Language Model for Second-Pass Rescoring in ASR.

Interspeech2021 Chenpeng Du, Kai Yu 0004
Rich Prosody Diversity Modelling with Phone-Level Mixture Density Network.

Interspeech2021 Xuenan Xu, Heinrich Dinkel, Mengyue Wu, Kai Yu 0004
A Lightweight Framework for Online Voice Activity Detection in the Wild.

TASLP2020 Shuai Wang 0016, Yexin Yang, Zhanghao Wu, Yanmin Qian, Kai Yu 0004
Data Augmentation Using Deep Generative Models for Embedding Based Speaker Recognition.

TASLP2020 Kai Yu 0004, Rao Ma, Kaiyu Shi, Qi Liu 0018, 
Neural Network Language Model Compression With Product Quantization and Soft Binarization.

TASLP2020 Su Zhu, Ruisheng Cao, Kai Yu 0004
Dual Learning for Semi-Supervised Natural Language Understanding.

#29  | Chin-Hui Lee | Google Scholar   DBLP
VenuesInterspeech: 28ICASSP: 18TASLP: 11SpeechComm: 2
Years2022: 62021: 82020: 132019: 102018: 92017: 82016: 5
ISCA Sectionacoustic scene classification: 2multi-channel speech enhancement: 2speech enhancement: 2far-field speech recognition: 2speech enhancement and noise reduction: 2spoken language processing: 1speaker embedding and diarization: 1acoustic scene analysis: 1spoken dialogue systems and multimodality: 1multimodal systems: 1speaker diarization: 1privacy-preserving machine learning for audio & speech processing: 1single-channel speech enhancement: 1voice activity detection and keyword spotting: 1speech emotion recognition: 1speech coding and evaluation: 1speech and audio classification: 1deep enhancement: 1the first dihard speech diarization challenge: 1noise robust and far-field asr: 1speech recognition: 1source separation and auditory scene analysis: 1special session: 1
IEEE Keywordspeech enhancement: 18speech recognition: 15regression analysis: 8deep neural network: 6speaker recognition: 5recurrent neural nets: 5reverberation: 4automatic speech recognition: 3deep neural network (dnn): 3speech intelligibility: 3maximum likelihood estimation: 3noise: 3speaker diarization: 2voice activity detection: 2neural net architecture: 2probability: 2progressive learning: 2improved minima controlled recursive averaging: 2post processing: 2dense structure: 2teacher student learning: 2least mean squares methods: 2mean square error methods: 2gaussian distribution: 2generalized gaussian distribution: 2convolutional neural nets: 2ideal ratio mask: 2gaussian processes: 2natural language processing: 2computer assisted pronunciation training (capt): 2deep learning based speech enhancement: 2statistical speech enhancement: 2hidden markov models: 2transfer learning: 2misp challenge: 1audio visual systems: 1wake word spotting: 1public domain software: 1audio visual: 1microphone array: 1speech coding: 1ts vad: 1decoding: 1m2met: 1optimisation: 1acoustic model: 1entropy: 1cross entropy: 1robust automatic speech recognition: 1adaptive noise and speech estimation: 1and federated learning: 1acoustic modeling: 1data privacy: 1quantum machine learning: 1snr progressive learning: 1microphone arrays: 1neural network: 1domain adaptation: 1backpropagation: 1knowledge representation: 1label embedding: 1maximum likelihood: 1multi objective learning: 1shape factors update: 1tensors: 1tensor train network: 1tensor to vector regression: 12d to 2d mapping: 1fuzzy neural nets: 1fully convolutional neural network: 1performance evaluation: 1child speech extraction: 1speech separation: 1measures: 1realistic conditions: 1signal classification: 1source separation: 1speech recognition safety: 1adversarial robustness: 1adversarial examples: 1robust speech recognition: 1gradient methods: 1prediction error modeling: 1non native tone modeling and mispronunciation detection: 1pattern classification: 1computer assisted language learning (call): 1vector to vector regression: 1expressive power: 1function approximation: 1universal approximation: 1noise robust speech recognition: 1error statistics: 1improved speech presence probability: 1gain function: 1acoustic noise: 1signal denoising: 1multiobjective ensembling: 1cepstral analysis: 1speech enhancement (se): 1compact and low latency design: 1multiobjective learning: 1long short term memory: 1tone recognition and mispronunciation detection: 1computer assistant language learning (call): 1densely connected progressive learning: 1multiple target learning: 1filtering theory: 1highly mismatch condition: 1recurrent networks: 1recurrent neural networks: 1convolution: 1regression model: 1convolutional neural networks: 1feedforward neural nets: 1bayesian learning: 1unsupervised speaker adaptation: 1bayes methods: 1deep neural networks: 1prior evolution: 1online adaptation: 1gender mixture detection: 1speaker clustering: 1unsupervised speech separation: 1mixture models: 1gender issues: 1speaker dissimilarity measure: 1reverberation time aware (rta): 1deep neural networks (dnns): 1speech dereverberation: 1mean variance normalization: 1frame shift: 1linear output layer: 1acoustic context: 1model stacking: 1multi task training: 1model compression: 1english corpus: 1i vector system: 1attribute detectors: 1signal representation: 1natural languages: 1finnish corpus: 1linguistics: 1estimation theory: 1keyword spotting: 1large vocabulary continuous speech recognition (lvcsr): 1under resourced languages: 1spoken term detection (std): 1automatic speech recognition (asr): 1
Most Publications2020: 352016: 242021: 232015: 232007: 22

Affiliations
URLs

ICASSP2022 Hang Chen, Hengshun Zhou, Jun Du, Chin-Hui Lee, Jingdong Chen, Shinji Watanabe 0001, Sabato Marco Siniscalchi, Odette Scharenborg, Diyuan Liu, Bao-Cai Yin, Jia Pan, Jianqing Gao, Cong Liu 0006, 
The First Multimodal Information Based Speech Processing (Misp) Challenge: Data, Tasks, Baselines And Results.

ICASSP2022 Maokui He, Xiang Lv, Weilin Zhou, Jingjing Yin, Xiaoqi Zhang, Yuxuan Wang, Shutong Niu, Yuhang Cao, Heng Lu, Jun Du, Chin-Hui Lee
The USTC-Ximalaya System for the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription (M2met) Challenge.

Interspeech2022 Hang Chen, Jun Du, Yusheng Dai, Chin-Hui Lee, Sabato Marco Siniscalchi, Shinji Watanabe 0001, Odette Scharenborg, Jingdong Chen, Baocai Yin, Jia Pan, 
Audio-Visual Speech Recognition in MISP2021 Challenge: Dataset Release and Deep Analysis.

Interspeech2022 Mao-Kui He, Jun Du, Chin-Hui Lee
End-to-End Audio-Visual Neural Speaker Diarization.

Interspeech2022 Yajian Wang, Jun Du, Hang Chen, Qing Wang 0008, Chin-Hui Lee
Deep Segment Model for Acoustic Scene Classification.

Interspeech2022 Hengshun Zhou, Jun Du, Gongzhen Zou, Zhaoxu Nian, Chin-Hui Lee, Sabato Marco Siniscalchi, Shinji Watanabe 0001, Odette Scharenborg, Jingdong Chen, Shifu Xiong, Jianqing Gao, 
Audio-Visual Wake Word Spotting in MISP2021 Challenge: Dataset Release and Deep Analysis.

TASLP2021 Li Chai 0002, Jun Du, Qing-Feng Liu, Chin-Hui Lee
A Cross-Entropy-Guided Measure (CEGM) for Assessing Speech Recognition Performance and Optimizing DNN-Based Speech Enhancement.

ICASSP2021 Zhaoxu Nian, Yan-Hui Tu, Jun Du, Chin-Hui Lee
A Progressive Learning Approach to Adaptive Noise and Speech Estimation for Speech Enhancement and Noisy Speech Recognition.

ICASSP2021 Chao-Han Huck Yang, Jun Qi 0002, Samuel Yen-Chi Chen, Pin-Yu Chen, Sabato Marco Siniscalchi, Xiaoli Ma, Chin-Hui Lee
Decentralizing Feature Extraction with Quantum Convolutional Neural Network for Automatic Speech Recognition.

Interspeech2021 Hang Chen, Jun Du, Yu Hu 0003, Li-Rong Dai 0001, Bao-Cai Yin, Chin-Hui Lee
Automatic Lip-Reading with Hierarchical Pyramidal Convolution and Self-Attention for Image Sequences with No Word Boundaries.

Interspeech2021 Yu-Xuan Wang, Jun Du, Maokui He, Shutong Niu, Lei Sun, Chin-Hui Lee
Scenario-Dependent Speaker Diarization for DIHARD-III Challenge.

Interspeech2021 Chao-Han Huck Yang, Sabato Marco Siniscalchi, Chin-Hui Lee
PATE-AAE: Incorporating Adversarial Autoencoder into Private Aggregation of Teacher Ensembles for Spoken Command Classification.

Interspeech2021 Xiaoqi Zhang, Jun Du, Li Chai 0002, Chin-Hui Lee
A Maximum Likelihood Approach to SNR-Progressive Learning Using Generalized Gaussian Distribution for LSTM-Based Speech Enhancement.

Interspeech2021 Hengshun Zhou, Jun Du, Hang Chen, Zijun Jing, Shifu Xiong, Chin-Hui Lee
Audio-Visual Information Fusion Using Cross-Modal Teacher-Student Learning for Voice Activity Detection in Realistic Environments.

TASLP2020 Yanhui Tu, Jun Du, Tian Gao, Chin-Hui Lee
A Multi-Target SNR-Progressive Learning Approach to Regression Based Speech Enhancement.

ICASSP2020 Zhong Meng, Hu Hu, Jinyu Li 0001, Changliang Liu, Yan Huang 0028, Yifan Gong 0001, Chin-Hui Lee
L-Vector: Neural Label Embedding for Domain Adaptation.

ICASSP2020 Shutong Niu, Jun Du, Li Chai 0002, Chin-Hui Lee
A Maximum Likelihood Approach to Multi-Objective Learning Using Generalized Gaussian Distributions for Dnn-Based Speech Enhancement.

ICASSP2020 Jun Qi 0002, Hu Hu, Yannan Wang, Chao-Han Huck Yang, Sabato Marco Siniscalchi, Chin-Hui Lee
Tensor-To-Vector Regression for Multi-Channel Speech Enhancement Based on Tensor-Train Network.

ICASSP2020 Yanhui Tu, Jun Du, Chin-Hui Lee
2D-to-2D Mask Estimation for Speech Enhancement Based on Fully Convolutional Neural Network.

ICASSP2020 Xin Wang 0037, Jun Du, Alejandrina Cristià, Lei Sun, Chin-Hui Lee
A Study of Child Speech Extraction Using Joint Speech Enhancement and Separation in Realistic Conditions.

#30  | Jianwu Dang 0001 | Google Scholar   DBLP
VenuesInterspeech: 36ICASSP: 20SpeechComm: 2
Years2022: 212021: 122020: 142019: 32018: 42017: 12016: 3
ISCA Sectionspatial audio: 3speech synthesis: 2asr: 2emotion and sentiment analysis: 2learning techniques for speaker recognition: 2speech processing in the brain: 2cognition and brain studies: 2speech quality assessment: 1speech representation: 1zero, low-resource and multi-modal speech recognition: 1dereverberation, noise reduction, and speaker extraction: 1spoken dialogue systems and multimodality: 1spoken language processing: 1spoken dialogue systems: 1robust speaker recognition: 1targeted source separation: 1speech and voice disorders: 1speech emotion recognition: 1conversational systems: 1single-channel speech enhancement: 1voice and hearing disorders: 1acoustic phonetics: 1speech enhancement: 1adaptation and accommodation in conversation: 1robust speech recognition: 1spoofing detection: 1acoustic and articulatory phonetics: 1speech production models: 1
IEEE Keywordspeech recognition: 11emotion recognition: 7speech emotion recognition: 6speaker recognition: 5natural language processing: 4domain adaptation: 2pattern classification: 2speaker embedding: 2speaker extraction: 2reverberation: 2naturalness: 2speech synthesis: 2recurrent neural nets: 2convolutional neural nets: 2speaker verification: 2meta learning: 2interactive systems: 2representation learning: 2image representation: 2capsule networks: 2center loss: 1array signal processing: 1doa estimation: 1beamforming: 1direction of arrival estimation: 1speaker localizer: 1mutual information: 1multiple references: 1style: 1audio signal processing: 1content: 1transformer: 1task driven loss: 1feature distillation: 1model compression: 1utterance level representation: 1signal classification: 1signal representation: 1double constrained: 1graph theory: 1dialogue level contextual information: 1atmosphere: 1domain invariant: 1multilayer perceptrons: 1meta generalized transformation: 1query processing: 1knowledge retrieval: 1dialogue system: 1natural language generation: 1knowledge based systems: 1multi head attention: 1multi stage: 1time domain: 1signal fusion: 1speech coding: 1pitch prediction: 1speech codecs: 1pitch control: 1image recognition: 1channel attention: 1convolution: 1spectro temporal attention: 1hearing: 1convolutional neural network: 1voice activity detection: 1auditory encoder: 1ear: 1sensor fusion: 1vgg 16: 1graph convolutional: 1multimodal emotion recognition: 1image fusion: 1optimisation: 1cross channel: 1meta speaker embedding network: 1medical signal processing: 1end to end model: 1dysarthric speech recognition: 1articulatory attribute detection: 1self attention: 1multi view: 1time frequency: 1two stage: 1time frequency analysis: 1speech dereverberation: 1multi target learning: 1spectrograms fusion: 1hierarchical model.: 1mandarin dialog act recognition: 1acoustic and lexical context information: 1speech based user interfaces: 1heuristic features: 1convolutional neural network (cnn): 1bottleneck features: 1extreme learning machine (elm): 1vowels: 1speech intelligibility: 1consonants: 1laplace equations: 1articulatory space: 1auditory space: 1speech production: 1
Most Publications2022: 482021: 402020: 322019: 292016: 24

Affiliations
Tianjin University, Tianjin Key Laboratory of Cognitive Computing and Application, College of Intelligence and Computing, China
Institute of Communication Parlee, ICP, Center of National Research Scientific, France (2002-2003)
Japan Advanced Institute of Science and Technology, JAIST, Japan
Shizuoka University, Japan (PhD 1992)
URLs

SpeechComm2022 Lili Guo, Longbiao Wang, Jianwu Dang 0001, Eng Siong Chng, Seiichi Nakagawa, 
Learning affective representations based on magnitude and dynamic relative phase information for speech emotion recognition.

SpeechComm2022 Wenhuan Lu, Xinyue Zhao, Na Guo, Yongwei Li, Jianguo Wei, Jianhua Tao, Jianwu Dang 0001
One-shot emotional voice conversion based on feature separation.

ICASSP2022 Yuan Gao, Shogo Okada, Longbiao Wang, Jiaxing Liu, Jianwu Dang 0001
Domain-Invariant Feature Learning for Cross Corpus Speech Emotion Recognition.

ICASSP2022 Meng Ge, Chenglin Xu, Longbiao Wang, Eng Siong Chng, Jianwu Dang 0001, Haizhou Li 0001, 
L-SpEx: Localized Target Speaker Extraction.

ICASSP2022 Cheng Gong, Longbiao Wang, Zhenhua Ling, Ju Zhang, Jianwu Dang 0001
Using Multiple Reference Audios and Style Embedding Constraints for Speech Synthesis.

ICASSP2022 Yongjie Lv, Longbiao Wang, Meng Ge, Sheng Li 0010, Chenchen Ding, Lixin Pan, Yuguang Wang, Jianwu Dang 0001, Kiyoshi Honda, 
Compressing Transformer-Based ASR Model by Task-Driven Loss and Attention-Based Multi-Level Feature Distillation.

ICASSP2022 Yaodong Song, Jiaxing Liu, Longbiao Wang, Ruiguo Yu, Jianwu Dang 0001
Multi-Stage Graph Representation Learning for Dialogue-Level Speech Emotion Recognition.

ICASSP2022 Hanyi Zhang, Longbiao Wang, Kong Aik Lee, Meng Liu, Jianwu Dang 0001, Hui Chen, 
Learning Domain-Invariant Transformation for Speaker Verification.

ICASSP2022 Xiangyu Zhao, Longbiao Wang, Jianwu Dang 0001
Improving Dialogue Generation via Proactively Querying Grounded Knowledge.

Interspeech2022 Yanjie Fu, Meng Ge, Haoran Yin, Xinyuan Qian, Longbiao Wang, Gaoyan Zhang, Jianwu Dang 0001
Iterative Sound Source Localization for Unknown Number of Sources.

Interspeech2022 Jiaxu He, Cheng Gong, Longbiao Wang, Di Jin 0001, Xiaobao Wang, Junhai Xu, Jianwu Dang 0001
Improve emotional speech synthesis quality by learning explicit and implicit representations with semi-supervised training.

Interspeech2022 Kai Li 0018, Sheng Li 0010, Xugang Lu, Masato Akagi, Meng Liu, Lin Zhang, Chang Zeng, Longbiao Wang, Jianwu Dang 0001, Masashi Unoki, 
Data Augmentation Using McAdams-Coefficient-Based Speaker Anonymization for Fake Audio Detection.

Interspeech2022 Junjie Li, Meng Ge, Zexu Pan, Longbiao Wang, Jianwu Dang 0001
VCSE: Time-Domain Visual-Contextual Speaker Extraction Network.

Interspeech2022 Nan Li, Meng Ge, Longbiao Wang, Masashi Unoki, Sheng Li 0010, Jianwu Dang 0001
Global Signal-to-noise Ratio Estimation Based on Multi-subband Processing Using Convolutional Neural Network.

Interspeech2022 Siqing Qin, Longbiao Wang, Sheng Li 0010, Yuqin Lin, Jianwu Dang 0001
Finer-grained Modeling units-based Meta-Learning for Low-resource Tibetan Speech Recognition.

Interspeech2022 Hao Shi, Longbiao Wang, Sheng Li 0010, Jianwu Dang 0001, Tatsuya Kawahara, 
Monaural Speech Enhancement Based on Spectrogram Decomposition for Convolutional Neural Network-sensitive Feature Extraction.

Interspeech2022 Tongtong Song, Qiang Xu, Meng Ge, Longbiao Wang, Hao Shi, Yongjie Lv, Yuqin Lin, Jianwu Dang 0001
Language-specific Characteristic Assistance for Code-switching Speech Recognition.

Interspeech2022 Shiquan Wang, Yuke Si, Xiao Wei, Longbiao Wang, Zhiqiang Zhuang, Xiaowang Zhang, Jianwu Dang 0001
TopicKS: Topic-driven Knowledge Selection for Knowledge-grounded Dialogue Generation.

Interspeech2022 Xiao Wei, Yuke Si, Shiquan Wang, Longbiao Wang, Jianwu Dang 0001
Hierarchical Tagger with Multi-task Learning for Cross-domain Slot Filling.

Interspeech2022 Qiang Xu, Tongtong Song, Longbiao Wang, Hao Shi, Yuqin Lin, Yongjie Lv, Meng Ge, Qiang Yu 0005, Jianwu Dang 0001
Self-Distillation Based on High-level Information Supervision for Compressing End-to-End ASR Model.

#31  | Jun Du | Google Scholar   DBLP
VenuesInterspeech: 34ICASSP: 13TASLP: 9SpeechComm: 2
Years2022: 92021: 102020: 142019: 142018: 72017: 32016: 1
ISCA Sectionspeaker diarization: 3speaker recognition: 3speaker embedding and diarization: 2speech enhancement: 2far-field speech recognition: 2spoken language processing: 1acoustic scene analysis: 1low-resource asr development: 1spoken dialogue systems and multimodality: 1multimodal systems: 1tools, corpora and resources: 1interspeech 2021 deep noise suppression challenge: 1single-channel speech enhancement: 1voice activity detection and keyword spotting: 1asr model training and strategies: 1acoustic model adaptation for asr: 1acoustic scene classification: 1multi-channel speech enhancement: 1speech emotion recognition: 1speech coding and evaluation: 1speech and audio classification: 1corpus annotation and evaluation: 1the second dihard speech diarization challenge (dihard ii): 1deep enhancement: 1the first dihard speech diarization challenge: 1source separation and auditory scene analysis: 1speech enhancement and noise reduction: 1
IEEE Keywordspeech enhancement: 15speech recognition: 10speaker recognition: 8regression analysis: 5deep neural network: 5automatic speech recognition: 3reverberation: 3speech intelligibility: 3progressive learning: 3convolutional neural nets: 3noise: 3speech coding: 2decoding: 2speaker diarization: 2voice activity detection: 2neural net architecture: 2deep neural network (dnn): 2entropy: 2improved minima controlled recursive averaging: 2neural network: 2signal classification: 2recurrent neural nets: 2post processing: 2dense structure: 2least mean squares methods: 2mean square error methods: 2maximum likelihood estimation: 2gaussian distribution: 2generalized gaussian distribution: 2ideal ratio mask: 2gaussian processes: 2deep learning based speech enhancement: 2misp challenge: 1audio visual systems: 1wake word spotting: 1public domain software: 1audio visual: 1microphone array: 1ts vad: 1m2met: 1time domain: 1snr constriction: 1optimisation: 1probability: 1acoustic model: 1cross entropy: 1robust automatic speech recognition: 1adaptive noise and speech estimation: 1memory aware networks: 1speaker adaptation: 1computational complexity: 1snr progressive learning: 1microphone arrays: 1matrix algebra: 1model adaptation: 1scaling: 1attention: 1ctc: 1dilated convolution: 1speaker verification: 1attention mechanism: 1baum welch statistics: 1maximum likelihood: 1multi objective learning: 1shape factors update: 12d to 2d mapping: 1fuzzy neural nets: 1fully convolutional neural network: 1performance evaluation: 1child speech extraction: 1speech separation: 1measures: 1realistic conditions: 1source separation: 1prediction error modeling: 1acoustic modeling: 1joint optimization: 1bandwidth expansion: 1mixed bandwidth speech recognition: 1vector to vector regression: 1expressive power: 1function approximation: 1universal approximation: 1noise robust speech recognition: 1teacher student learning: 1error statistics: 1improved speech presence probability: 1gain function: 1statistical speech enhancement: 1acoustic noise: 1signal denoising: 1multiobjective ensembling: 1cepstral analysis: 1speech enhancement (se): 1hidden markov models: 1compact and low latency design: 1multiobjective learning: 1long short term memory: 1diarization: 1overlap detection: 1densely connected progressive learning: 1multiple target learning: 1filtering theory: 1highly mismatch condition: 1gender mixture detection: 1speaker clustering: 1unsupervised speech separation: 1mixture models: 1gender issues: 1speaker dissimilarity measure: 1
Most Publications2020: 612021: 512019: 502022: 492018: 43

Affiliations
URLs

ICASSP2022 Hang Chen, Hengshun Zhou, Jun Du, Chin-Hui Lee, Jingdong Chen, Shinji Watanabe 0001, Sabato Marco Siniscalchi, Odette Scharenborg, Diyuan Liu, Bao-Cai Yin, Jia Pan, Jianqing Gao, Cong Liu 0006, 
The First Multimodal Information Based Speech Processing (Misp) Challenge: Data, Tasks, Baselines And Results.

ICASSP2022 Maokui He, Xiang Lv, Weilin Zhou, Jingjing Yin, Xiaoqi Zhang, Yuxuan Wang, Shutong Niu, Yuhang Cao, Heng Lu, Jun Du, Chin-Hui Lee, 
The USTC-Ximalaya System for the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription (M2met) Challenge.

ICASSP2022 Zhaoxu Nian, Jun Du, Yu Ting Yeung, Renyu Wang, 
A Time Domain Progressive Learning Approach with SNR Constriction for Single-Channel Speech Enhancement and Recognition.

Interspeech2022 Hang Chen, Jun Du, Yusheng Dai, Chin-Hui Lee, Sabato Marco Siniscalchi, Shinji Watanabe 0001, Odette Scharenborg, Jingdong Chen, Baocai Yin, Jia Pan, 
Audio-Visual Speech Recognition in MISP2021 Challenge: Dataset Release and Deep Analysis.

Interspeech2022 Mao-Kui He, Jun Du, Chin-Hui Lee, 
End-to-End Audio-Visual Neural Speaker Diarization.

Interspeech2022 Yajian Wang, Jun Du, Hang Chen, Qing Wang 0008, Chin-Hui Lee, 
Deep Segment Model for Acoustic Scene Classification.

Interspeech2022 Yanyan Yue, Jun Du, Mao-Kui He, Yu Ting Yeung, Renyu Wang, 
Online Speaker Diarization with Core Samples Selection.

Interspeech2022 Guolong Zhong, Hongyu Song, Ruoyu Wang 0029, Lei Sun, Diyuan Liu, Jia Pan, Xin Fang, Jun Du, Jie Zhang, Lirong Dai, 
External Text Based Data Augmentation for Low-Resource Speech Recognition in the Constrained Condition of OpenASR21 Challenge.

Interspeech2022 Hengshun Zhou, Jun Du, Gongzhen Zou, Zhaoxu Nian, Chin-Hui Lee, Sabato Marco Siniscalchi, Shinji Watanabe 0001, Odette Scharenborg, Jingdong Chen, Shifu Xiong, Jianqing Gao, 
Audio-Visual Wake Word Spotting in MISP2021 Challenge: Dataset Release and Deep Analysis.

TASLP2021 Li Chai 0002, Jun Du, Qing-Feng Liu, Chin-Hui Lee, 
A Cross-Entropy-Guided Measure (CEGM) for Assessing Speech Recognition Performance and Optimizing DNN-Based Speech Enhancement.

ICASSP2021 Zhaoxu Nian, Yan-Hui Tu, Jun Du, Chin-Hui Lee, 
A Progressive Learning Approach to Adaptive Noise and Speech Estimation for Speech Enhancement and Noisy Speech Recognition.

Interspeech2021 Hang Chen, Jun Du, Yu Hu 0003, Li-Rong Dai 0001, Bao-Cai Yin, Chin-Hui Lee, 
Automatic Lip-Reading with Hierarchical Pyramidal Convolution and Self-Attention for Image Sequences with No Word Boundaries.

Interspeech2021 Yihui Fu, Luyao Cheng, Shubo Lv, Yukai Jv, Yuxiang Kong, Zhuo Chen 0006, Yanxin Hu, Lei Xie 0001, Jian Wu 0027, Hui Bu, Xin Xu, Jun Du, Jingdong Chen, 
AISHELL-4: An Open Source Dataset for Speech Enhancement, Separation, Recognition and Speaker Diarization in Conference Scenario.

Interspeech2021 Maokui He, Desh Raj, Zili Huang, Jun Du, Zhuo Chen 0006, Shinji Watanabe 0001, 
Target-Speaker Voice Activity Detection with Improved i-Vector Estimation for Unknown Number of Speaker.

Interspeech2021 Koen Oostermeijer, Qing Wang 0008, Jun Du
Lightweight Causal Transformer with Local Self-Attention for Real-Time Speech Enhancement.

Interspeech2021 Neville Ryant, Prachi Singh, Venkat Krishnamohan, Rajat Varma, Kenneth Church 0001, Christopher Cieri, Jun Du, Sriram Ganapathy, Mark Liberman, 
The Third DIHARD Diarization Challenge.

Interspeech2021 Yu-Xuan Wang, Jun Du, Maokui He, Shutong Niu, Lei Sun, Chin-Hui Lee, 
Scenario-Dependent Speaker Diarization for DIHARD-III Challenge.

Interspeech2021 Xiaoqi Zhang, Jun Du, Li Chai 0002, Chin-Hui Lee, 
A Maximum Likelihood Approach to SNR-Progressive Learning Using Generalized Gaussian Distribution for LSTM-Based Speech Enhancement.

Interspeech2021 Hengshun Zhou, Jun Du, Hang Chen, Zijun Jing, Shifu Xiong, Chin-Hui Lee, 
Audio-Visual Information Fusion Using Cross-Modal Teacher-Student Learning for Voice Activity Detection in Realistic Environments.

TASLP2020 Jia Pan, Genshun Wan, Jun Du, Zhongfu Ye, 
Online Speaker Adaptation Using Memory-Aware Networks for Speech Recognition.

#32  | Paavo Alku | Google Scholar   DBLP
VenuesInterspeech: 25SpeechComm: 16ICASSP: 10TASLP: 6
Years2022: 42021: 22020: 62019: 132018: 72017: 132016: 12
ISCA Sectionvoice conversion for style, accent, and emotion: 2styles, varieties, forensics and tools: 2glottal source modeling: 2speech synthesis: 2atypical speech analysis and detection: 1phonetics: 1speech in health: 1neural techniques for voice conversion and waveform generation: 1voice quality characterization for clinical voice assessment: 1speech analysis and representation: 1voice conversion and speech synthesis: 1speech pathology, depression, and medical applications: 1voice conversion: 1prosody: 1short utterances speaker recognition: 1prosody, phonation and voice quality: 1speech analysis: 1speech quality & intelligibility: 1special session: 1co-inference of production and acoustics: 1speaker recognition: 1
IEEE Keywordspeech synthesis: 4speech recognition: 3vocoders: 3speaker recognition: 3filtering theory: 2quasi closed phase analysis: 2formant tracking: 2weighted linear prediction: 2time varying linear prediction: 2speech analysis: 2hidden markov models: 2text to speech: 2correlation methods: 2prediction theory: 2iterative methods: 1pattern classification: 1parkinson's disease: 1glottal source estimation: 1diseases: 1glottal features: 1multilayer perceptrons: 1support vector machines: 1end to end systems: 1dynamic programming: 1kalman filters: 1emotion recognition: 1emotions: 1excitation source: 1glottal closure instants: 1epochs: 1children speech recognition: 1formant modification: 1dnn: 1wavenet: 1feedforward neural nets: 1glottal source model: 1f0 estimation: 1spectral analysis: 1data augmentation: 1noise robustness: 1gan: 1glottal excitation model: 1inference mechanisms: 1neural vocoding: 1lom bard speech: 1style conversion: 1pulse model in log domain vocoder: 1vocal effort: 1cyclegan: 1closed phase analysis: 1quadratic programming (qpr): 1glottal inverse filtering: 1quadratic programming: 1estimation theory: 1robust feature extraction: 1fast fourier transforms: 1linear prediction: 1mismatch: 1regression analysis: 1lombard effect: 1gaussian processes: 1speech intelligibility: 1speech enhancement: 1telephone speech: 1high pass filters: 1intelligibility: 1recurrent neural nets: 1adaptation: 1lombard speech synthesis: 1lstm tts: 1probability: 1i vector: 1voice conversion: 1non parallel training: 1vocal effort mismatch: 1shouted speech: 1spectral mapping: 1acr: 1speech coding: 1artificial bandwidth extension: 1natural language processing: 1listening test: 1smoothing methods: 1
Most Publications2019: 192014: 192012: 182017: 162016: 16

Affiliations
URLs

SpeechComm2022 Hemant Kumar Kathania, Sudarsana Reddy Kadiri, Paavo Alku, Mikko Kurimo, 
A formant modification method for improved ASR of children's speech.

SpeechComm2022 Mittapalle Kiran Reddy, Hilla Pohjalainen, Pyry Helkkula, Kasimir Kaitue, Mikko Minkkinen, Heli Tolppanen, Tuomo Nieminen, Paavo Alku
Glottal flow characteristics in vowels produced by speakers with heart failure.

Interspeech2022 Farhad Javanmardi, Sudarsana Reddy Kadiri, Manila Kodali, Paavo Alku
Comparing 1-dimensional and 2-dimensional spectral feature representations in voice pathology detection using machine learning and deep learning classifiers.

Interspeech2022 Sudarsana Reddy Kadiri, Farhad Javanmardi, Paavo Alku
Convolutional Neural Networks for Classification of Voice Qualities from Speech and Neck Surface Accelerometer Signals.

SpeechComm2021 Krishna Gurugubelli, Anil Kumar Vuppala, N. P. Narendra, Paavo Alku
Duration of the rhotic approximant /ɹ/ in spastic dysarthria of different severity levels.

TASLP2021 N. P. Narendra, Björn W. Schuller, Paavo Alku
The Detection of Parkinson's Disease From Speech Using Voice Source Information.

SpeechComm2020 Sudarsana Reddy Kadiri, Paavo Alku, B. Yegnanarayana 0001, 
Analysis and classification of phonation types in speech and singing voice.

SpeechComm2020 N. P. Narendra, Paavo Alku
Automatic intelligibility assessment of dysarthric speech using glottal parameters.

TASLP2020 Dhananjaya N. Gowda, Sudarsana Reddy Kadiri, Brad H. Story, Paavo Alku
Time-Varying Quasi-Closed-Phase Analysis for Accurate Formant Tracking in Speech Signals.

ICASSP2020 Sudarsana Reddy Kadiri, Paavo Alku, B. Yegnanarayana 0001, 
Comparison of Glottal Closure Instants Detection Algorithms for Emotional Speech.

ICASSP2020 Hemant Kumar Kathania, Sudarsana Reddy Kadiri, Paavo Alku, Mikko Kurimo, 
Study of Formant Modification for Children ASR.

Interspeech2020 Sudarsana Reddy Kadiri, Rashmi Kethireddy, Paavo Alku
Parkinson's Disease Detection from Speech Using Single Frequency Filtering Cepstral Coefficients.

SpeechComm2019 N. P. Narendra, Manu Airaksinen, Brad H. Story, Paavo Alku
Estimation of the glottal source from coded telephone speech using deep neural networks.

SpeechComm2019 Paavo Alku, Tiina Murtola, Jarmo Malinen, Juha Kuortti, Brad H. Story, Manu Airaksinen, Mika Salmi, Erkki Vilkman, Ahmed Geneid, 
OPENGLOT - An open environment for the evaluation of glottal inverse filtering.

SpeechComm2019 Tiina Murtola, Jarmo Malinen, Ahmed Geneid, Paavo Alku
Analysis of phonation onsets in vowel production, using information from glottal area and flow estimate.

SpeechComm2019 Bajibabu Bollepalli, Lauri Juvela, Manu Airaksinen, Cassia Valentini-Botinhao, Paavo Alku
Normal-to-Lombard adaptation of speech synthesis using long short-term memory recurrent neural networks.

SpeechComm2019 N. P. Narendra, Paavo Alku
Dysarthric speech classification from coded telephone speech using glottal features.

TASLP2019 Lauri Juvela, Bajibabu Bollepalli, Vassilis Tsiaras, Paavo Alku
GlotNet - A Raw Waveform Model for the Glottal Excitation in Statistical Parametric Speech Synthesis.

ICASSP2019 Manu Airaksinen, Lauri Juvela, Paavo Alku, Okko Räsänen, 
Data Augmentation Strategies for Neural Network F0 Estimation.

ICASSP2019 Lauri Juvela, Bajibabu Bollepalli, Junichi Yamagishi, Paavo Alku
Waveform Generation for Text-to-speech Synthesis Using Pitch-synchronous Multi-scale Generative Adversarial Networks.

#33  | Tomohiro Nakatani | Google Scholar   DBLP
VenuesInterspeech: 29ICASSP: 24TASLP: 3SpeechComm: 1
Years2022: 22021: 62020: 122019: 132018: 72017: 102016: 7
ISCA Sectionspeech enhancement: 3multi-channel speech enhancement: 2acoustic models for asr: 2far-field, robustness and adaptation: 2dereverberation, noise reduction, and speaker extraction: 1speech localization, enhancement, and quality assessment: 1speech enhancement and intelligibility: 1noise reduction and intelligibility: 1monaural source separation: 1diarization: 1asr for noisy and far-field speech: 1asr neural network architectures: 1speech and audio source separation and scene analysis: 1neural networks for language modeling: 1adjusting to speaker, accent, and domain: 1distant asr: 1speech intelligibility and quality: 1source separation and auditory scene analysis: 1far-field speech recognition: 1speech-enhancement: 1noise robust and far-field asr: 1speech intelligibility: 1acoustic model adaptation: 1speech enhancement and noise reduction: 1
IEEE Keywordspeech recognition: 18speaker recognition: 9speech enhancement: 8blind source separation: 7source separation: 6reverberation: 4neural network: 4maximum likelihood estimation: 3audio signal processing: 3array signal processing: 3optimisation: 3robust asr: 3backpropagation: 3natural language processing: 3covariance analysis: 2gaussian distribution: 2microphone arrays: 2convolution: 2dynamic stream weights: 2beamforming: 2dereverberation: 2automatic speech recognition: 2target speech extraction: 2time domain network: 2time domain analysis: 2frequency domain analysis: 2hidden markov models: 2joint training: 2adaptation: 2auxiliary feature: 2iterative methods: 2speaker extraction: 2gaussian processes: 2mixture models: 2expectation maximization (em) algorithm: 1microphones: 1blind source separation (bss): 1full rank spatial covariance analysis (fca): 1expectation maximisation algorithm: 1multivariate complex gaussian distribution: 1multichannel wiener filter: 1wiener filters: 1independent component analysis: 1covariance matrices: 1joint diagonalization: 1complex backpropagation: 1transfer functions: 1signal to distortion ratio: 1multi channel source separation: 1acoustic beamforming: 1meeting recognition: 1speaker activity: 1speech extraction: 1audio visual systems: 1audiovisual speaker localization: 1sensor fusion: 1data fusion: 1video signal processing: 1image fusion: 1filtering theory: 1microphone array: 1spatial features: 1multi task loss: 1generalized eigenvalue problem: 1gaussian noise: 1overdetermined: 1block coordinate descent method: 1independent vector analysis: 1single channel speech enhancement: 1signal denoising: 1student’s t distribution: 1independent positive semidefinite tensor analysis: 1tensors: 1convolutional neural nets: 1time domain: 1speech separation: 1multi speaker speech recognition: 1computational complexity: 1end to end speech recognition: 1tracking: 1recurrent neural nets: 1backprop kalman filter: 1audiovisual speaker tracking: 1kalman filters: 1domain adaptation: 1topic model: 1text analysis: 1recurrent neural network language model: 1sequence summary network: 1joint optimization: 1least squares approximations: 1decoding: 1encoder decoder: 1semi supervised learning: 1autoencoder: 1encoding: 1speech synthesis: 1source counting: 1online processing: 1meeting diarization: 1speech separation/extraction: 1speaker attention: 1linear programming: 1integer programming: 1integer linear programming (ilp): 1oracle (upper bound) performance: 1maximum coverage of content words: 1compressive speech summarization: 1acoustic modeling: 1adaptive training: 1deep neural network: 1acoustic model adaptation: 1feedforward neural nets: 1speech mixtures: 1probability: 1speaker diarization: 1direction of arrival estimation: 1microphone array signal processing: 1maximum likelihood method: 1spatial filters: 1speaker adaptive neural network: 1context adaptation: 1spatial diffuseness features: 1cnn based acoustic model: 1environmental robustness: 1inverse problems: 1conditional density: 1model based feature enhancement: 1mixture density network: 1auxiliary features: 1unsupervised dnn adaptation: 1multi pass feature enhancement: 1deep neural networks: 1convolutional neural network: 1parametric rectified linear unit: 1image classification: 1computer vision: 1noise robustness: 1
Most Publications2021: 332020: 292019: 292018: 252017: 24

Affiliations
URLs

ICASSP2022 Hiroshi Sawada, Rintaro Ikeshita, Keisuke Kinoshita, Tomohiro Nakatani
Multi-Frame Full-Rank Spatial Covariance Analysis for Underdetermined BSS in Reverberant Environments.

Interspeech2022 Marc Delcroix, Keisuke Kinoshita, Tsubasa Ochiai, Katerina Zmolíková, Hiroshi Sato, Tomohiro Nakatani
Listen only to me! How well can target speech extraction handle false alarms?

TASLP2021 Nobutaka Ito, Rintaro Ikeshita, Hiroshi Sawada, Tomohiro Nakatani
A Joint Diagonalization Based Efficient Approach to Underdetermined Blind Audio Source Separation Using the Multichannel Wiener Filter.

ICASSP2021 Christoph Böddeker, Wangyou Zhang, Tomohiro Nakatani, Keisuke Kinoshita, Tsubasa Ochiai, Marc Delcroix, Naoyuki Kamo, Yanmin Qian, Reinhold Haeb-Umbach, 
Convolutive Transfer Function Invariant SDR Training Criteria for Multi-Channel Reverberant Speech Separation.

ICASSP2021 Marc Delcroix, Katerina Zmolíková, Tsubasa Ochiai, Keisuke Kinoshita, Tomohiro Nakatani
Speaker Activity Driven Neural Speech Extraction.

ICASSP2021 Julio Wissing, Benedikt T. Boenninghoff, Dorothea Kolossa, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Shoko Araki, Christopher Schymura, 
Data Fusion for Audiovisual Speaker Localization: Extending Dynamic Stream Weights to the Spatial Domain.

Interspeech2021 Christopher Schymura, Benedikt T. Bönninghoff, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Shoko Araki, Dorothea Kolossa, 
PILOT: Introducing Transformers for Probabilistic Sound Event Localization.

Interspeech2021 Ayako Yamamoto, Toshio Irino, Kenichi Arai, Shoko Araki, Atsunori Ogawa, Keisuke Kinoshita, Tomohiro Nakatani
Comparison of Remote Experiments Using Crowdsourcing and Laboratory Experiments on Speech Intelligibility.

SpeechComm2020 Katsuhiko Yamamoto, Toshio Irino, Shoko Araki, Keisuke Kinoshita, Tomohiro Nakatani
GEDI: Gammachirp envelope distortion index for predicting intelligibility of enhanced speech.

TASLP2020 Tomohiro Nakatani, Christoph Böddeker, Keisuke Kinoshita, Rintaro Ikeshita, Marc Delcroix, Reinhold Haeb-Umbach, 
Jointly Optimal Denoising, Dereverberation, and Source Separation.

ICASSP2020 Marc Delcroix, Tsubasa Ochiai, Katerina Zmolíková, Keisuke Kinoshita, Naohiro Tawara, Tomohiro Nakatani, Shoko Araki, 
Improving Speaker Discrimination of Target Speech Extraction With Time-Domain Speakerbeam.

ICASSP2020 Rintaro Ikeshita, Tomohiro Nakatani, Shoko Araki, 
Overdetermined Independent Vector Analysis.

ICASSP2020 Keisuke Kinoshita, Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani
Improving Noise Robust Automatic Speech Recognition with Single-Channel Time-Domain Enhancement Network.

ICASSP2020 Tatsuki Kondo, Kanta Fukushige, Norihiro Takamune, Daichi Kitamura, Hiroshi Saruwatari, Rintaro Ikeshita, Tomohiro Nakatani
Convergence-Guaranteed Independent Positive Semidefinite Tensor Analysis Based on Student's T Distribution.

ICASSP2020 Thilo von Neumann, Keisuke Kinoshita, Lukas Drude, Christoph Böddeker, Marc Delcroix, Tomohiro Nakatani, Reinhold Haeb-Umbach, 
End-to-End Training of Time Domain Audio Separation and Recognition.

ICASSP2020 Christopher Schymura, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Shoko Araki, Dorothea Kolossa, 
A Dynamic Stream Weight Backprop Kalman Filter for Audiovisual Speaker Tracking.

Interspeech2020 Kenichi Arai, Shoko Araki, Atsunori Ogawa, Keisuke Kinoshita, Tomohiro Nakatani, Toshio Irino, 
Predicting Intelligibility of Enhanced Speech Using Posteriors Derived from DNN-Based ASR System.

Interspeech2020 Keisuke Kinoshita, Thilo von Neumann, Marc Delcroix, Tomohiro Nakatani, Reinhold Haeb-Umbach, 
Multi-Path RNN for Hierarchical Modeling of Long Sequential Data and its Application to Speaker Stream Separation.

Interspeech2020 Tomohiro Nakatani, Rintaro Ikeshita, Keisuke Kinoshita, Hiroshi Sawada, Shoko Araki, 
Computationally Efficient and Versatile Framework for Joint Optimization of Blind Speech Separation and Dereverberation.

Interspeech2020 Thilo von Neumann, Christoph Böddeker, Lukas Drude, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani, Reinhold Haeb-Umbach, 
Multi-Talker ASR for an Unknown Number of Sources: Joint Training of Source Counting, Separation and ASR.

#34  | Chng Eng Siong | Google Scholar   DBLP
VenuesInterspeech: 30ICASSP: 21TASLP: 3SpeechComm: 2EMNLP: 1
Years2022: 132021: 62020: 102019: 62018: 62017: 32016: 13
ISCA Sectionspeech enhancement, bandwidth extension and hearing aids: 2asr neural network architectures: 2cross-lingual and multilingual asr: 2spoken term detection: 2acoustic signal representation and analysis: 1robust asr, and far-field/multi-talker asr: 1multimodal speech emotion recognition and paralinguistics: 1speech segmentation: 1speech type classification and diagnosis: 1language and accent recognition: 1targeted source separation: 1bi- and multilinguality: 1acoustic model adaptation for asr: 1lexicon and language model for speech recognition: 1speaker and language recognition: 1neural waveform generation: 1speech technologies for code-switching in multilingual communities: 1language modeling: 1show and tell: 1source separation from monaural input: 1speaker recognition evaluation: 1source separation and voice activity detection: 1language recognition: 1robust speaker recognition and anti-spoofing: 1automatic learning of representations: 1resources and annotation of resources: 1
IEEE Keywordspeech recognition: 13speaker recognition: 7speech enhancement: 4speaker extraction: 4automatic speech recognition: 3speaker embedding: 3signal reconstruction: 3entropy: 2direction of arrival estimation: 2reverberation: 2keyword spotting: 2gaussian processes: 2natural language processing: 2information retrieval: 2emotion recognition: 2speech emotion recognition: 2sensor fusion: 2audio signal processing: 2time domain: 2mixture models: 2optimisation: 1reinforcement leaning: 1generative adversarial network: 1contrastive learning: 1array signal processing: 1doa estimation: 1beamforming: 1speaker localizer: 1noise robust speech recognition: 1joint training approach: 1interactive feature fusion: 1over suppression phenomenon: 1small footprint: 1noisy far field: 1asr: 1non autoregressive: 1transformer: 1minimum word error: 1autoregressive processes: 1code switching: 1pattern classification: 1bert: 1text analysis: 1interactive systems: 1multi relations: 1dialogue relation extraction: 1co attention mechanism: 1recurrent neural nets: 1convolutional neural nets: 1multi level acoustic information: 1multimodal fusion: 1multi stage: 1signal fusion: 1representation learning: 1image recognition: 1channel attention: 1convolution: 1spectro temporal attention: 1signal representation: 1adversarial training: 1disentangled feature learning: 1signal denoising: 1speech coding: 1multi task learning: 1multi scale: 1depth wise separable convolution: 1speech bandwidth extension: 1multi scale fusion: 1signal restoration: 1time domain analysis: 1independent language model: 1low resource asr: 1pre training: 1catastrophic forgetting.: 1fine tuning: 1speech separation: 1spectrum approximation loss: 1source separation: 1unsupervised domain adaptation: 1domain adversarial training: 1frequency warping: 1residual compensation: 1sparse representation: 1voice conversion: 1exemplar: 1interpolation: 1feature adaptation: 1linear transform: 1temporal filtering: 1robust speech recognition: 1transforms: 1short duration utterance: 1speaker verification: 1content aware local variability: 1estimation theory: 1deep neural network (dnn): 1large vocabulary continuous speech recognition (lvcsr): 1under resourced languages: 1spoken term detection (std): 1automatic speech recognition (asr): 1phase: 1spoofing attack: 1high dimensional feature: 1counter measure: 1spoofing detection: 1error analysis: 1sparse matrices: 1backpropagation: 1chime 3 challenge: 1deep neural network: 1matrix decomposition: 1compressed sensing: 1non negative matrix factorization: 1direction of arrival: 1mean square error methods: 1time frequency analysis: 1eigenvector clustering: 1spatial covariance: 1eigenvalues and eigenfunctions: 1pattern clustering: 1microphone arrays: 1covariance matrices: 1expectation maximisation algorithm: 1expectation maximization: 1query processing: 1spoken term detection: 1data augmentation: 1time series: 1dtw: 1partial matching: 1query by example: 1
Most Publications2022: 382015: 282016: 272020: 222021: 21


SpeechComm2022 Lili Guo, Longbiao Wang, Jianwu Dang 0001, Eng Siong Chng, Seiichi Nakagawa, 
Learning affective representations based on magnitude and dynamic relative phase information for speech emotion recognition.

ICASSP2022 Chen Chen, Yuchen Hu, Nana Hou, Xiaofeng Qi, Heqing Zou, Eng Siong Chng
Self-Critical Sequence Training for Automatic Speech Recognition.

ICASSP2022 Chen Chen, Nana Hou, Yuchen Hu, Shashank Shirol, Eng Siong Chng
Noise-Robust Speech Recognition With 10 Minutes Unparalleled In-Domain Data.

ICASSP2022 Meng Ge, Chenglin Xu, Longbiao Wang, Eng Siong Chng, Jianwu Dang 0001, Haizhou Li 0001, 
L-SpEx: Localized Target Speaker Extraction.

ICASSP2022 Yuchen Hu, Nana Hou, Chen Chen, Eng Siong Chng
Interactive Feature Fusion for End-to-End Noise-Robust Speech Recognition.

ICASSP2022 Dianwen Ng, Yunqi Chen, Biao Tian, Qiang Fu 0001, Eng Siong Chng
Convmixer: Feature Interactive Convolution with Curriculum Learning for Small Footprint and Noisy Far-Field Keyword Spotting.

ICASSP2022 Yizhou Peng, Jicheng Zhang, Haihua Xu, Hao Huang, Eng Siong Chng
Minimum Word Error Training For Non-Autoregressive Transformer-Based Code-Switching ASR.

ICASSP2022 Fuzhao Xue, Aixin Sun, Hao Zhang 0048, Jinjie Ni, Eng Siong Chng
An Embarrassingly Simple Model for Dialogue Relation Extraction.

ICASSP2022 Heqing Zou, Yuke Si, Chen Chen, Deepu Rajan, Eng Siong Chng
Speech Emotion Recognition with Co-Attention Based Multi-Level Acoustic Information.

Interspeech2022 Chen Chen, Nana Hou, Yuchen Hu, Heqing Zou, Xiaofeng Qi, Eng Siong Chng
Interactive Auido-text Representation for Automated Audio Captioning with Contrastive Learning.

Interspeech2022 Zixun Guo, Chen Chen, Eng Siong Chng
DENT-DDSP: Data-efficient noisy speech generator using differentiable digital signal processors for explicit distortion modelling and noise-robust speech recognition.

Interspeech2022 Tarun Gupta, Duc-Tuan Truong, Tran The Anh, Eng Siong Chng
Estimation of speaker age and height from speech signal using bi-encoder transformer mixture model.

Interspeech2022 Yang Xiao, Nana Hou, Eng Siong Chng
Rainbow Keywords: Efficient Incremental Learning for Online Spoken Keyword Spotting.

ICASSP2021 Meng Ge, Chenglin Xu, Longbiao Wang, Eng Siong Chng, Jianwu Dang 0001, Haizhou Li 0001, 
Multi-Stage Speaker Extraction with Utterance and Frame-Level Reference Signals.

ICASSP2021 Lili Guo, Longbiao Wang, Chenglin Xu, Jianwu Dang 0001, Eng Siong Chng, Haizhou Li 0001, 
Representation Learning with Spectro-Temporal-Channel Attention for Speech Emotion Recognition.

ICASSP2021 Nana Hou, Chenglin Xu, Eng Siong Chng, Haizhou Li 0001, 
Learning Disentangled Feature Representations for Speech Enhancement Via Adversarial Training.

Interspeech2021 Weiguang Chen, Van Tung Pham, Eng Siong Chng, Xionghu Zhong, 
Overlapped Speech Detection Based on Spectral and Spatial Feature Fusion.

Interspeech2021 Jicheng Zhang, Yizhou Peng, Van Tung Pham, Haihua Xu, Hao Huang, Eng Siong Chng
E2E-Based Multi-Task Learning Approach to Joint Speech and Accent Recognition.

EMNLP2021 Yingzhu Zhao, Chongjia Ni, Cheung-Chi Leung, Shafiq R. Joty, Eng Siong Chng, Bin Ma 0001, 
A Unified Speaker Adaptation Approach for ASR.

TASLP2020 Chenglin Xu, Wei Rao, Eng Siong Chng, Haizhou Li 0001, 
SpEx: Multi-Scale Time Domain Speaker Extraction Network.

#35  | Yifan Gong 0001 | Google Scholar   DBLP
VenuesICASSP: 30Interspeech: 25TASLP: 1
Years2022: 22021: 122020: 112019: 142018: 72017: 52016: 5
ISCA Sectionasr neural network architectures: 3streaming for asr/rnn transducers: 2multi- and cross-lingual asr, other topics in asr: 2novel models and training methods for asr: 1topics in asr: 1self-supervision and semi-supervision for neural asr training: 1neural network training methods for asr: 1acoustic model adaptation for asr: 1streaming asr: 1feature extraction and distant asr: 1asr neural network architectures and training: 1search for speech recognition: 1spoken term detection, confidence measure, and end-to-end speech recognition: 1asr neural network training: 1novel neural network architectures for acoustic modelling: 1novel approaches to enhancement: 1deep enhancement: 1noise reduction: 1styles, varieties, forensics and tools: 1acoustic model adaptation: 1low resource speech recognition: 1
IEEE Keywordspeech recognition: 25recurrent neural nets: 10speaker recognition: 8speaker adaptation: 6deep neural network: 5natural language processing: 4probability: 4lstm: 4automatic speech recognition: 4attention: 4ctc: 4signal classification: 4teacher student learning: 3vocabulary: 3adversarial learning: 3sequence training: 2end to end: 2meeting transcription: 2speech synthesis: 2decoding: 2adaptation: 2acoustic to word: 2speech coding: 2oov: 2digital assistant: 2neural network: 2dnn: 2acoustic modeling: 2end to end training: 2deep neural networks: 2matrix algebra: 2streaming: 1multi talker asr: 1end to end end point detection: 1recurrent neural network transducer: 1attention based encoder decoder: 1language model: 1regularization: 1self teaching: 1combination: 1segmentation: 1diarisation: 1sound source localisation: 1hidden markov model: 1hidden markov models: 1microphone arrays: 1speaker location: 1filtering theory: 1audio signal processing: 1speaker diarization: 1source separation: 1speech separation: 1system fusion: 1acoustic model adaptation: 1unsupervised learning: 1neural language generation: 1streaming attention based sequence to sequence asr: 1pattern classification: 1encoding: 1latency reduction: 1monotonic chunkwise attention: 1latency: 1computer aided instruction: 1entropy: 1domain adaptation: 1backpropagation: 1knowledge representation: 1label embedding: 1keyword spotting: 1text analysis: 1text to speech: 1rnn t: 1end to end system: 1universal acoustic model: 1mixture models: 1interpolation: 1mixture of experts: 1confidence classifier: 1word embedding: 1model adaptation: 1model combination: 1acoustic state prediction: 1senone classification: 1future context frames: 1layer trajectory: 1temporal modeling: 1asr: 1language identification: 1code switching: 1domain invariant training: 1speaker verification: 1application program interfaces: 1privacy preserving: 1cloud computing: 1quantization: 1encryption: 1polynomials: 1cryptography: 1far field: 1acoustic model: 1spotting: 1data compression: 1speaker invariant training: 1adversariallearning: 1mean square error methods: 1cepstra minimum mean square error: 1cepstral analysis: 1smoothing methods: 1noise robustness: 1feedforward neural nets: 1recurrent neural network: 1long short term memory: 1non negativity: 1personalization: 1long short term memory (lstm): 1recurrent neural networks: 1support vector machines: 1maximum margin: 1svm: 1
Most Publications2019: 322020: 262021: 242018: 232017: 13

Affiliations
Microsoft Corporation, Redmond, WA, USA
Texas Instruments Inc., Dallas, TX, USA
INRIA-Lorraine, Nancy, France
Henri Poincaré University, Department of Mathematics and Computer Science, Nancy, France (PhD)

ICASSP2022 Liang Lu 0001, Jinyu Li 0001, Yifan Gong 0001
Endpoint Detection for Streaming End-to-End Multi-Talker ASR.

Interspeech2022 Zhong Meng, Yashesh Gaur, Naoyuki Kanda, Jinyu Li 0001, Xie Chen 0001, Yu Wu 0012, Yifan Gong 0001
Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition.

ICASSP2021 Zhong Meng, Naoyuki Kanda, Yashesh Gaur, Sarangarajan Parthasarathy, Eric Sun, Liang Lu 0001, Xie Chen 0001, Jinyu Li 0001, Yifan Gong 0001
Internal Language Model Training for Domain-Adaptive End-To-End Speech Recognition.

ICASSP2021 Eric Sun, Liang Lu 0001, Zhong Meng, Yifan Gong 0001
Sequence-Level Self-Teaching Regularization.

ICASSP2021 Jeremy H. M. Wong, Dimitrios Dimitriadis, Ken'ichi Kumatani, Yashesh Gaur, George Polovets, Partha Parthasarathy, Eric Sun, Jinyu Li 0001, Yifan Gong 0001
Ensemble Combination between Different Time Segmentations.

ICASSP2021 Jeremy H. M. Wong, Xiong Xiao, Yifan Gong 0001
Hidden Markov Model Diarisation with Speaker Location Information.

ICASSP2021 Xiong Xiao, Naoyuki Kanda, Zhuo Chen 0006, Tianyan Zhou, Takuya Yoshioka, Sanyuan Chen, Yong Zhao 0008, Gang Liu, Yu Wu 0012, Jian Wu 0027, Shujie Liu 0001, Jinyu Li 0001, Yifan Gong 0001
Microsoft Speaker Diarization System for the Voxceleb Speaker Recognition Challenge 2020.

Interspeech2021 Liang Lu 0001, Naoyuki Kanda, Jinyu Li 0001, Yifan Gong 0001
Streaming Multi-Talker Speech Recognition with Joint Speaker Identification.

Interspeech2021 Liang Lu 0001, Zhong Meng, Naoyuki Kanda, Jinyu Li 0001, Yifan Gong 0001
On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer.

Interspeech2021 Yan Huang 0028, Guoli Ye, Jinyu Li 0001, Yifan Gong 0001
Rapid Speaker Adaptation for Conformer Transducer: Attention and Bias Are All You Need.

Interspeech2021 Yan Deng, Rui Zhao 0017, Zhong Meng, Xie Chen 0001, Bing Liu, Jinyu Li 0001, Yifan Gong 0001, Lei He 0005, 
Improving RNN-T for Domain Scaling Using Semi-Supervised Training with Neural TTS.

Interspeech2021 Vikas Joshi, Amit Das, Eric Sun, Rupesh R. Mehta, Jinyu Li 0001, Yifan Gong 0001
Multiple Softmax Architecture for Streaming Multilingual End-to-End ASR Systems.

Interspeech2021 Zhong Meng, Yu Wu 0012, Naoyuki Kanda, Liang Lu 0001, Xie Chen 0001, Guoli Ye, Eric Sun, Jinyu Li 0001, Yifan Gong 0001
Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition.

Interspeech2021 Eric Sun, Jinyu Li 0001, Zhong Meng, Yu Wu 0012, Jian Xue, Shujie Liu 0001, Yifan Gong 0001
Improving Multilingual Transformer Transducer Models by Reducing Language Confusions.

ICASSP2020 Yan Huang 0028, Lei He 0005, Wenning Wei, William Gale, Jinyu Li 0001, Yifan Gong 0001
Using Personalized Speech Synthesis and Neural Language Generator for Rapid Speaker Adaptation.

ICASSP2020 Hirofumi Inaguma, Yashesh Gaur, Liang Lu 0001, Jinyu Li 0001, Yifan Gong 0001
Minimum Latency Training Strategies for Streaming Sequence-to-Sequence ASR.

ICASSP2020 Jinyu Li 0001, Rui Zhao 0017, Eric Sun, Jeremy H. M. Wong, Amit Das, Zhong Meng, Yifan Gong 0001
High-Accuracy and Low-Latency Speech Recognition with Two-Head Contextual Layer Trajectory LSTM Model.

ICASSP2020 Zhong Meng, Hu Hu, Jinyu Li 0001, Changliang Liu, Yan Huang 0028, Yifan Gong 0001, Chin-Hui Lee, 
L-Vector: Neural Label Embedding for Domain Adaptation.

ICASSP2020 Eva Sharma, Guoli Ye, Wenning Wei, Rui Zhao 0017, Yao Tian, Jian Wu 0027, Lei He 0005, Ed Lin, Yifan Gong 0001
Adaptation of RNN Transducer with Text-To-Speech Technology for Keyword Spotting.

Interspeech2020 Yan Huang 0028, Jinyu Li 0001, Lei He 0005, Wenning Wei, William Gale, Yifan Gong 0001
Rapid RNN-T Adaptation Using Personalized Speech Synthesis and Neural Language Generator.

#36  | Sriram Ganapathy | Google Scholar   DBLP
VenuesInterspeech: 36ICASSP: 16TASLP: 2SpeechComm: 1
Years2022: 112021: 92020: 112019: 102018: 82017: 32016: 3
ISCA Sectionspeaker diarization: 3the first dicova challenge: 2feature extraction and distant asr: 2speaker recognition: 2the second dihard speech diarization challenge (dihard ii): 2speaker verification: 2voice conversion and adaptation: 1low-resource asr development: 1show and tell: 1speech and language in health: 1robust asr, and far-field/multi-talker asr: 1atypical speech detection: 1non-intrusive objective speech quality assessment (nisqa) challenge for online conferencing applications: 1survey talk: 1conferencingspeech 2021 challenge: 1spoken language understanding: 1language learning: 1speech and voice disorders: 1feature extraction for asr: 1speaker recognition and anti-spoofing: 1spoken term detection, confidence measure, and end-to-end speech recognition: 1speaker and language recognition: 1neural network training strategies for asr: 1second language acquisition and code-switching: 1perspective talk: 1distant asr: 1noise robust speech recognition: 1speaker and language recognition applications: 1far-field, robustness and adaptation: 1
IEEE Keywordspeaker recognition: 9speech recognition: 6natural language processing: 4signal representation: 3deep learning (artificial intelligence): 3speaker diarization: 3text analysis: 2iterative methods: 2supervised learning: 2autoregressive processes: 2pattern clustering: 2self supervised learning: 2recurrent neural nets: 2filtering theory: 2regression analysis: 2support vector regression: 2gaussian processes: 2support vector machines: 2linear discriminant analysis: 2overlap detection: 2emotion recognition: 1transformer networks: 1self attention models: 1multi modal emotion recognition: 1learnable front end: 1prediction theory: 1representation learning: 1contrastive predictive coding: 1zerospeech challenge: 1deep clustering: 1convolutional neural nets: 1dereverberation: 1frequency domain linear prediction (fdlp): 1end to end automatic speech recognition: 1joint modeling: 1reverberation: 1healthcare: 1pattern classification: 1signal classification: 1machine learning: 1audio recording: 1covid 19: 1audio signal processing: 1respiratory diagnosis: 1optimisation: 1graph structural clustering: 1path integral clustering: 1feature selection: 1feedback of acoustic embeddings: 12 stage relevance weighting: 1raw speech waveform: 1speech representation learning: 1electroencephalography: 1medical signal processing: 1canonical correlation analysis (cca): 1audio eeg analysis: 1multi way cca: 1deep cca: 1neurophysiology: 1sequence modeling: 1long short term memory (lstm) networks: 1language recognition: 1attention networks: 1i vectors: 1human versus machine: 1hearing: 1speech intelligibility: 1language familiarity: 1parameter estimation: 1response time: 1benchmarking speaker diarization: 1talker change detection: 1end to end asr: 1singing voice separation: 1speech separation: 1time frequency analysis: 1unsupervised filter learning: 1convolutional variational autoencoder: 1modulation filtering: 1robust speech recognition: 1skip connections: 1automatic joint height and age estimation: 1short duration: 1deep neural network: 1end to end language identification: 1hierarchical gru: 1attention: 1speaker verification.: 1probability: 1dimensionality reduction: 1x vectors: 1gaussian back end: 1gaussian distribution: 1plda scoring: 1diarization: 1speech enhancement: 1automatic speech recognition: 1neural net architecture: 1mixture models: 1lstm modeling: 1conversational speech analysis: 1joint factor analysis: 1ivectors: 1spoof detection: 1speech synthesis: 1security of data: 1speaker verification: 1i vector: 1age estimation: 1deep neural networks: 1
Most Publications2020: 312021: 252022: 192019: 192018: 10

Affiliations
URLs

ICASSP2022 Soumya Dutta, Sriram Ganapathy
Multimodal Transformer with Learnable Frontend and Self Attention for Emotion Recognition.

ICASSP2022 Varun Krishna, Sriram Ganapathy
Self Supervised Representation Learning with Deep Clustering for Acoustic Unit Discovery from Raw Speech.

ICASSP2022 Rohit Kumar, Anurenjan Purushothaman, Anirudh Sreeram, Sriram Ganapathy
End-To-End Speech Recognition with Joint Dereverberation of Sub-Band Autoregressive Envelopes.

ICASSP2022 Neeraj Kumar Sharma 0001, Srikanth Raj Chetupalli, Debarpan Bhattacharya, Debottam Dutta, Pravin Mote, Sriram Ganapathy
The Second Dicova Challenge: Dataset and Performance Analysis for Diagnosis of Covid-19 Using Acoustics.

Interspeech2022 Shrutina Agarwal, Naoya Takahashi, Sriram Ganapathy
Leveraging Symmetrical Convolutional Transformer Networks for Speech to Singing Voice Style Transfer.

Interspeech2022 Tarun Sai Bandarupalli, Shakti Rath, Nirmesh Shah, Naoyuki Onoe, Sriram Ganapathy
Semi-supervised Acoustic and Language Modeling for Hindi ASR.

Interspeech2022 Debarpan Bhattacharya, Debottam Dutta, Neeraj Kumar Sharma 0001, Srikanth Raj Chetupalli, Pravin Mote, Sriram Ganapathy, Chandrakiran C, Sahiti Nori, Suhail K. K, Sadhana Gonuguntla, Murali Alagesan, 
Coswara: A website application enabling COVID-19 screening by analysing respiratory sound samples and health symptoms.

Interspeech2022 Debarpan Bhattacharya, Debottam Dutta, Neeraj Kumar Sharma 0001, Srikanth Raj Chetupalli, Pravin Mote, Sriram Ganapathy, Chandrakiran C, Sahiti Nori, Suhail K. K, Sadhana Gonuguntla, Murali Alagesan, 
Analyzing the impact of SARS-CoV-2 variants on respiratory sound signals.

Interspeech2022 Srikanth Raj Chetupalli, Sriram Ganapathy
Speaker conditioned acoustic modeling for multi-speaker conversational ASR.

Interspeech2022 Debottam Dutta, Debarpan Bhattacharya, Sriram Ganapathy, Amir Hossein Poorjam, Deepak Mittal, Maneesh Singh 0001, 
Acoustic Representation Learning on Breathing and Speech Signals for COVID-19 Detection.

Interspeech2022 M. K. Jayesh, Mukesh Sharma, Praneeth Vonteddu, Mahaboob Ali Basha Shaik, Sriram Ganapathy
Transformer Networks for Non-Intrusive Speech Quality Prediction.

TASLP2021 Prachi Singh, Sriram Ganapathy
Self-Supervised Representation Learning With Path Integral Clustering for Speaker Diarization.

ICASSP2021 Purvi Agrawal, Sriram Ganapathy
Representation Learning for Speech Recognition Using Feedback Based Relevance Weighting.

ICASSP2021 Jaswanth Reddy Katthi, Sriram Ganapathy
Deep Multiway Canonical Correlation Analysis For Multi-Subject Eeg Normalization.

Interspeech2021 Flávio Ávila, Amir H. Poorjam, Deepak Mittal, Charles Dognin, Ananya Muguli, Rohit Kumar, Srikanth Raj Chetupalli, Sriram Ganapathy, Maneesh Singh 0001, 
Investigating Feature Selection and Explainability for COVID-19 Diagnostics from Cough Sounds.

Interspeech2021 Sriram Ganapathy
Uncovering the Acoustic Cues of COVID-19 Infection.

Interspeech2021 Ananya Muguli, Lancelot Pinto, Nirmala R., Neeraj Kumar Sharma 0001, Prashant Krishnan V, Prasanta Kumar Ghosh, Rohit Kumar, Shrirama Bhat, Srikanth Raj Chetupalli, Sriram Ganapathy, Shreyas Ramoji, Viral Nanda, 
DiCOVA Challenge: Dataset, Task, and Baseline System for COVID-19 Diagnosis Using Acoustics.

Interspeech2021 R. G. Prithvi Raj, Rohit Kumar, M. K. Jayesh, Anurenjan Purushothaman, Sriram Ganapathy, M. Ali Basha Shaik, 
SRIB-LEAP Submission to Far-Field Multi-Channel Speech Enhancement Challenge for Video Conferencing.

Interspeech2021 Neville Ryant, Prachi Singh, Venkat Krishnamohan, Rajat Varma, Kenneth Church 0001, Christopher Cieri, Jun Du, Sriram Ganapathy, Mark Liberman, 
The Third DIHARD Diarization Challenge.

Interspeech2021 Prachi Singh, Rajat Varma, Venkat Krishnamohan, Srikanth Raj Chetupalli, Sriram Ganapathy
LEAP Submission for the Third DIHARD Diarization Challenge.

#37  | Zhen-Hua Ling | Google Scholar   DBLP
VenuesInterspeech: 21ICASSP: 16TASLP: 13SpeechComm: 2AAAI: 1EMNLP: 1
Years2022: 92021: 112020: 102019: 72018: 72017: 22016: 8
ISCA Sectionspeech synthesis: 10voice conversion and adaptation: 2asr model training and strategies: 1acoustic model adaptation for asr: 1speaker recognition: 1corpus annotation and evaluation: 1singing and multimodal synthesis: 1voice conversion and speech synthesis: 1speech synthesis paradigms and methods: 1wavenet and novel paradigms: 1special session: 1
IEEE Keywordspeech synthesis: 20natural language processing: 7speech recognition: 7vocoders: 6voice conversion: 5neural network: 5hidden markov models: 5hidden markov model: 5sequence to sequence: 4speech coding: 4statistical parametric speech synthesis: 3speaker recognition: 3text analysis: 3recurrent neural nets: 3speech enhancement: 2voice activity detection: 2transformer: 2naturalness: 2audio signal processing: 2diseases: 2deep learning (artificial intelligence): 2variational autoencoder: 2knowledge representation: 2response selection: 2interactive systems: 2tacotron: 2unit selection: 2bottleneck features: 2vocoder: 2fourier transforms: 2gaussian processes: 2computational linguistics: 2deep belief network: 2belief networks: 2feedforward neural nets: 2denoising: 1reverberation: 1dereverberation: 1transient response: 1neural vocoder: 1signal denoising: 1cyclic training: 1recognition synthesis: 1any to one: 1supervised learning: 1self supervised training: 1grapheme to phoneme conversion: 1pre trained grapheme model: 1mutual information: 1multiple references: 1style: 1content: 1speech: 1image representation: 1gaze tracking: 1sensor fusion: 1bimodal fusion: 1bottleneck feature: 1eye tracking: 1dementia detection: 1neurophysiology: 1fastspeech: 1discourse level modeling: 1prosody modeling: 1wavelet transforms: 1anti spoofing: 1convolutional neural nets: 1adversarial example generation: 1multiple participants: 1deep contextualized utterance representations: 1object oriented methods: 1dialogue success: 1dialogue disentanglement: 1dialogue system technology challenge: 1bert: 1text to speech: 1autoregressive processes: 1pitch prediction: 1speech codecs: 1pitch control: 1speech analysis: 1data augmentation: 1support vector machines: 1alzheimer’s disease: 1phase spectrum: 1amplitude spectrum: 1software agents: 1pattern matching: 1information retrieval: 1dialogue: 1interactive matching network: 1utterance to utterance: 1signal representation: 1adversarial training: 1sequence to sequence (seq2seq): 1disentangle: 1linguistics: 1deep neural network: 1mel spectrogram: 1attention: 1neural waveform generator: 1wavernn: 1decoding: 1signal reconstruction: 1waveform generators: 1multiple target learning: 1quantisation (signal): 1inverse transforms: 1spectral enhancement: 1dnn: 1text supervision: 1style transfer: 1unsupervised learning: 1deep auto encoder: 1binary distributed hidden units: 1speech bandwidth extension: 1recurrent neural networks: 1dilated convolutional neural networks: 1word processing: 1long short term memory: 1syntax structure: 1grammars: 1sentence modeling: 1recurrent neural network: 1wavenet: 1― samplernn: 1convolutional codes: 1cepstral analysis: 1what where auto encoder: 1spectral envelope: 1convolution neural network: 1spoofing attack: 1speaker verification: 1postfilter: 1restricted boltzmann machine: 1filtering theory: 1modulation: 1compensation: 1line spectral pair: 1modulation spectrum: 1linear transform: 1model clustering: 1pattern clustering: 1singing voice synthesis: 1
Most Publications2020: 382021: 332019: 332018: 302022: 28


TASLP2022 Yang Ai, Zhen-Hua Ling, Wei-Lu Wu, Ang Li, 
Denoising-and-Dereverberation Hierarchical Neural Vocoder for Statistical Parametric Speech Synthesis.

ICASSP2022 Yan-Nian Chen, Li-Juan Liu, Ya-Jun Hu, Yuan Jiang 0006, Zhen-Hua Ling
Improving Recognition-Synthesis Based any-to-one Voice Conversion with Cyclic Training.

ICASSP2022 Lu Dong, Zhiqiang Guo, Chao-Hong Tan, Ya-Jun Hu, Yuan Jiang 0006, Zhen-Hua Ling
Neural Grapheme-To-Phoneme Conversion with Pre-Trained Grapheme Models.

ICASSP2022 Cheng Gong, Longbiao Wang, Zhenhua Ling, Ju Zhang, Jianwu Dang 0001, 
Using Multiple Reference Audios and Style Embedding Constraints for Speech Synthesis.

ICASSP2022 Zhengyan Sheng, Zhiqiang Guo, Xin Li 0064, Yunxia Li, Zhenhua Ling
Dementia Detection by Fusing Speech and Eye-Tracking Representation.

ICASSP2022 Ning-Qian Wu, Zhaoci Liu, Zhen-Hua Ling
Discourse-Level Prosody Modeling with a Variational Autoencoder for Non-Autoregressive Expressive Speech Synthesis.

Interspeech2022 Chang Liu, Zhen-Hua Ling, Ling-Hui Chen, 
Pronunciation Dictionary-Free Multilingual Speech Synthesis by Combining Unsupervised and Supervised Phonetic Representations.

Interspeech2022 Zhaoci Liu, Ning-Qian Wu, Yajie Zhang, Zhenhua Ling
Integrating Discrete Word-Level Style Variations into Non-Autoregressive Acoustic Models for Speech Synthesis.

Interspeech2022 Yukun Peng, Zhenhua Ling
Decoupled Pronunciation and Prosody Modeling in Meta-Learning-based Multilingual Speech Synthesis.

TASLP2021 Yi-Yang Ding, Hao-Jian Lin, Li-Juan Liu, Zhen-Hua Ling, Yu Hu 0003, 
Robustness of Speech Spoofing Detectors Against Adversarial Post-Processing of Voice Conversion.

TASLP2021 Jia-Chen Gu, Tianda Li, Zhen-Hua Ling, Quan Liu, Zhiming Su, Yu-Ping Ruan, Xiaodan Zhu, 
Deep Contextualized Utterance Representations for Response Selection and Dialogue Analysis.

TASLP2021 Yajie Zhang, Zhen-Hua Ling
Extracting and Predicting Word-Level Style Variations for Speech Synthesis.

TASLP2021 Xiao Zhou, Zhen-Hua Ling, Li-Rong Dai 0001, 
UnitNet: A Sequence-to-Sequence Acoustic Model for Concatenative Speech Synthesis.

ICASSP2021 Cheng Gong, Longbiao Wang, Zhenhua Ling, Shaotong Guo, Ju Zhang 0001, Jianwu Dang 0001, 
Improving Naturalness and Controllability of Sequence-to-Sequence Speech Synthesis by Learning Local Prosody Representations.

ICASSP2021 Zhaoci Liu, Zhiqiang Guo, Zhenhua Ling, Yunxia Li, 
Detecting Alzheimer's Disease from Speech Using Neural Networks with Bottleneck Features and Data Augmentation.

Interspeech2021 Yue Chen, Zhen-Hua Ling, Qing-Feng Liu, 
A Neural-Network-Based Approach to Identifying Speakers in Novels.

Interspeech2021 Yi-Yang Ding, Li-Juan Liu, Yu Hu 0003, Zhen-Hua Ling
Adversarial Voice Conversion Against Neural Spoofing Detectors.

Interspeech2021 Xiao Zhou, Zhen-Hua Ling, Li-Rong Dai 0001, 
UnitNet-Based Hybrid Speech Synthesis.

AAAI2021 Jing-Xuan Zhang, Korin Richmond, Zhen-Hua Ling, Lirong Dai 0001, 
TaLNet: Voice Reconstruction from Tongue and Lip Articulation with Transfer Learning from Text-to-Speech Synthesis.

EMNLP2021 Jia-Chen Gu, Zhen-Hua Ling, Yu Wu, Quan Liu, Zhigang Chen 0003, Xiaodan Zhu, 
Detecting Speaker Personas from Conversational Texts.

#38  | Longbiao Wang | Google Scholar   DBLP
VenuesInterspeech: 32ICASSP: 20SpeechComm: 1
Years2022: 212021: 132020: 122019: 22018: 42016: 1
ISCA Sectionspatial audio: 3speech synthesis: 3asr: 2emotion and sentiment analysis: 2dnn architectures for speaker recognition: 2speech quality assessment: 1speech representation: 1zero, low-resource and multi-modal speech recognition: 1dereverberation, noise reduction, and speaker extraction: 1spoken dialogue systems and multimodality: 1spoken language processing: 1spoken dialogue systems: 1robust speaker recognition: 1targeted source separation: 1speech and voice disorders: 1speech emotion recognition: 1single-channel speech enhancement: 1voice and hearing disorders: 1learning techniques for speaker recognition: 1speech enhancement: 1adaptation and accommodation in conversation: 1robust speech recognition: 1spoofing detection: 1cognition and brain studies: 1speaker diarization and recognition: 1
IEEE Keywordspeech recognition: 12emotion recognition: 7speech emotion recognition: 6speaker recognition: 5natural language processing: 4speech synthesis: 3domain adaptation: 2pattern classification: 2speaker embedding: 2speaker extraction: 2reverberation: 2naturalness: 2recurrent neural nets: 2convolutional neural nets: 2speaker verification: 2meta learning: 2interactive systems: 2representation learning: 2image representation: 2capsule networks: 2center loss: 1array signal processing: 1doa estimation: 1beamforming: 1direction of arrival estimation: 1speaker localizer: 1mutual information: 1multiple references: 1style: 1audio signal processing: 1content: 1transformer: 1task driven loss: 1feature distillation: 1model compression: 1utterance level representation: 1signal classification: 1signal representation: 1double constrained: 1graph theory: 1dialogue level contextual information: 1atmosphere: 1style disentanglement: 1automatic speech recognition: 1style modeling: 1expressive speech synthesis: 1domain invariant: 1multilayer perceptrons: 1meta generalized transformation: 1query processing: 1knowledge retrieval: 1dialogue system: 1natural language generation: 1knowledge based systems: 1multi head attention: 1multi stage: 1time domain: 1signal fusion: 1speech coding: 1pitch prediction: 1speech codecs: 1pitch control: 1image recognition: 1channel attention: 1convolution: 1spectro temporal attention: 1hearing: 1convolutional neural network: 1voice activity detection: 1auditory encoder: 1ear: 1sensor fusion: 1vgg 16: 1graph convolutional: 1multimodal emotion recognition: 1image fusion: 1optimisation: 1cross channel: 1meta speaker embedding network: 1medical signal processing: 1end to end model: 1dysarthric speech recognition: 1articulatory attribute detection: 1self attention: 1multi view: 1time frequency: 1two stage: 1time frequency analysis: 1speech dereverberation: 1multi target learning: 1spectrograms fusion: 1hierarchical model.: 1mandarin dialog act recognition: 1acoustic and lexical context information: 1speech based user interfaces: 1heuristic features: 1convolutional neural network (cnn): 1bottleneck features: 1extreme learning machine (elm): 1
Most Publications2022: 442021: 352020: 242019: 192018: 16

Affiliations
Nagaoka University of Technology

SpeechComm2022 Lili Guo, Longbiao Wang, Jianwu Dang 0001, Eng Siong Chng, Seiichi Nakagawa, 
Learning affective representations based on magnitude and dynamic relative phase information for speech emotion recognition.

ICASSP2022 Yuan Gao, Shogo Okada, Longbiao Wang, Jiaxing Liu, Jianwu Dang 0001, 
Domain-Invariant Feature Learning for Cross Corpus Speech Emotion Recognition.

ICASSP2022 Meng Ge, Chenglin Xu, Longbiao Wang, Eng Siong Chng, Jianwu Dang 0001, Haizhou Li 0001, 
L-SpEx: Localized Target Speaker Extraction.

ICASSP2022 Cheng Gong, Longbiao Wang, Zhenhua Ling, Ju Zhang, Jianwu Dang 0001, 
Using Multiple Reference Audios and Style Embedding Constraints for Speech Synthesis.

ICASSP2022 Yongjie Lv, Longbiao Wang, Meng Ge, Sheng Li 0010, Chenchen Ding, Lixin Pan, Yuguang Wang, Jianwu Dang 0001, Kiyoshi Honda, 
Compressing Transformer-Based ASR Model by Task-Driven Loss and Attention-Based Multi-Level Feature Distillation.

ICASSP2022 Yaodong Song, Jiaxing Liu, Longbiao Wang, Ruiguo Yu, Jianwu Dang 0001, 
Multi-Stage Graph Representation Learning for Dialogue-Level Speech Emotion Recognition.

ICASSP2022 Kaili Zhang, Cheng Gong, Wenhuan Lu, Longbiao Wang, Jianguo Wei, Dawei Liu, 
Joint and Adversarial Training with ASR for Expressive Speech Synthesis.

ICASSP2022 Hanyi Zhang, Longbiao Wang, Kong Aik Lee, Meng Liu, Jianwu Dang 0001, Hui Chen, 
Learning Domain-Invariant Transformation for Speaker Verification.

ICASSP2022 Xiangyu Zhao, Longbiao Wang, Jianwu Dang 0001, 
Improving Dialogue Generation via Proactively Querying Grounded Knowledge.

Interspeech2022 Yanjie Fu, Meng Ge, Haoran Yin, Xinyuan Qian, Longbiao Wang, Gaoyan Zhang, Jianwu Dang 0001, 
Iterative Sound Source Localization for Unknown Number of Sources.

Interspeech2022 Jiaxu He, Cheng Gong, Longbiao Wang, Di Jin 0001, Xiaobao Wang, Junhai Xu, Jianwu Dang 0001, 
Improve emotional speech synthesis quality by learning explicit and implicit representations with semi-supervised training.

Interspeech2022 Kai Li 0018, Sheng Li 0010, Xugang Lu, Masato Akagi, Meng Liu, Lin Zhang, Chang Zeng, Longbiao Wang, Jianwu Dang 0001, Masashi Unoki, 
Data Augmentation Using McAdams-Coefficient-Based Speaker Anonymization for Fake Audio Detection.

Interspeech2022 Junjie Li, Meng Ge, Zexu Pan, Longbiao Wang, Jianwu Dang 0001, 
VCSE: Time-Domain Visual-Contextual Speaker Extraction Network.

Interspeech2022 Nan Li, Meng Ge, Longbiao Wang, Masashi Unoki, Sheng Li 0010, Jianwu Dang 0001, 
Global Signal-to-noise Ratio Estimation Based on Multi-subband Processing Using Convolutional Neural Network.

Interspeech2022 Siqing Qin, Longbiao Wang, Sheng Li 0010, Yuqin Lin, Jianwu Dang 0001, 
Finer-grained Modeling units-based Meta-Learning for Low-resource Tibetan Speech Recognition.

Interspeech2022 Hao Shi, Longbiao Wang, Sheng Li 0010, Jianwu Dang 0001, Tatsuya Kawahara, 
Monaural Speech Enhancement Based on Spectrogram Decomposition for Convolutional Neural Network-sensitive Feature Extraction.

Interspeech2022 Tongtong Song, Qiang Xu, Meng Ge, Longbiao Wang, Hao Shi, Yongjie Lv, Yuqin Lin, Jianwu Dang 0001, 
Language-specific Characteristic Assistance for Code-switching Speech Recognition.

Interspeech2022 Shiquan Wang, Yuke Si, Xiao Wei, Longbiao Wang, Zhiqiang Zhuang, Xiaowang Zhang, Jianwu Dang 0001, 
TopicKS: Topic-driven Knowledge Selection for Knowledge-grounded Dialogue Generation.

Interspeech2022 Xiao Wei, Yuke Si, Shiquan Wang, Longbiao Wang, Jianwu Dang 0001, 
Hierarchical Tagger with Multi-task Learning for Cross-domain Slot Filling.

Interspeech2022 Qiang Xu, Tongtong Song, Longbiao Wang, Hao Shi, Yuqin Lin, Yongjie Lv, Meng Ge, Qiang Yu 0005, Jianwu Dang 0001, 
Self-Distillation Based on High-level Information Supervision for Compressing End-to-End ASR Model.

#39  | Dan Su 0002 | Google Scholar   DBLP
VenuesInterspeech: 27ICASSP: 23ICLR: 1IJCAI: 1TASLP: 1
Years2022: 132021: 112020: 152019: 82018: 6
ISCA Sectionvoice conversion and adaptation: 4speech synthesis: 4deep learning for source separation and pitch tracking: 2speaker embedding and diarization: 1tools, corpora and resources: 1topics in asr: 1source separation, dereverberation and echo cancellation: 1novel neural network architectures for asr: 1multi-channel speech enhancement: 1speaker recognition: 1asr neural network architectures and training: 1new trends in self-supervised speech processing: 1speech synthesis paradigms and methods: 1multimodal speech processing: 1speech enhancement: 1asr neural network architectures: 1speaker verification using neural network methods: 1sequence models for asr: 1expressive speech synthesis: 1topics in speech recognition: 1
IEEE Keywordspeech recognition: 11speaker recognition: 8speech synthesis: 6natural language processing: 4recurrent neural nets: 3multi channel: 3overlapped speech: 3speech separation: 3data augmentation: 3speaker diarization: 2voice activity detection: 2multi look: 2speech enhancement: 2speaker verification: 2domain adaptation: 2transfer learning: 2maximum mean discrepancy: 2code switching: 2automatic speech recognition: 2attention based model: 2end to end speech recognition: 2graph neural network: 1text analysis: 1conversational text to speech synthesis: 1speaking style: 1low quality data: 1neural speech synthesis: 1style transfer: 1dual path: 1acoustic model: 1dynamic weight attention: 1echo suppression: 1joint training: 1streaming: 1microphone arrays: 1router architecture: 1accent embedding: 1global information: 1domain embedding: 1expert systems: 1mixture of experts: 1data handling: 1m2met: 1feature fusion: 1direction of arrival: 1direction of arrival estimation: 1neural net architecture: 1transferable architecture: 1neural architecture search: 1single channel: 1multi granularity: 1self attentive network: 1source separation: 1synthetic speech detection: 1replay detection: 1res2net: 1multi scale feature: 1asv anti spoofing: 1transformer: 1autoregressive processes: 1decoding: 1non autoregressive: 1ctc: 1speaker verification (sv): 1speech coding: 1speech intelligibility: 1phonetic pos teriorgrams: 1multi channel speech separation: 1spatial features: 1end to end: 1spatial filters: 1filtering theory: 1inter channel convolution differences: 1reverberation: 1parallel optimization: 1bmuf: 1lstm language model: 1graphics processing units: 1random sampling.: 1model partition: 1semi supervised learning: 1teacher student: 1accented speech recognition: 1accent conversion: 1self attention: 1persistent memory: 1dfsmn: 1language model: 1asr: 1acoustic variability: 1sequence discriminative training: 1hidden markov models: 1discriminative feature learning: 1quasifully recurrent neural network (qrnn): 1convolutional neural nets: 1variational inference: 1text to speech (tts) synthesis: 1parallel wavenet: 1convolutional neural network (cnn): 1parallel processing: 1feedforward neural nets: 1knowledge distillation: 1all rounder: 1teacher student training: 1multi domain: 1
Most Publications2021: 392022: 342020: 272019: 252018: 9

Affiliations
Tencent AI Lab, Shenzhen, China
URLs

ICASSP2022 Jingbei Li, Yi Meng, Chenyi Li, Zhiyong Wu 0001, Helen Meng, Chao Weng, Dan Su 0002
Enhancing Speaking Styles in Conversational Text-to-Speech Synthesis with Graph-Based Multi-Modal Context Modeling.

ICASSP2022 Songxiang Liu, Shan Yang, Dan Su 0002, Dong Yu 0001, 
Referee: Towards Reference-Free Cross-Speaker Style Transfer with Low-Quality Data for Expressive Speech Synthesis.

ICASSP2022 Dongpeng Ma, Yiwen Wang, Liqiang He, Mingjie Jin, Dan Su 0002, Dong Yu 0001, 
DP-DWA: Dual-Path Dynamic Weight Attention Network With Streaming Dfsmn-San For Automatic Speech Recognition.

ICASSP2022 Zhao You, Shulin Feng, Dan Su 0002, Dong Yu 0001, 
Speechmoe2: Mixture-of-Experts Model with Improved Routing.

ICASSP2022 Naijun Zheng, Na Li 0012, Xixin Wu, Lingwei Meng, Jiawen Kang, Haibin Wu, Chao Weng, Dan Su 0002, Helen Meng, 
The CUHK-Tencent Speaker Diarization System for the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Challenge.

ICASSP2022 Naijun Zheng, Na Li 0012, Jianwei Yu, Chao Weng, Dan Su 0002, Xunying Liu, Helen Meng, 
Multi-Channel Speaker Diarization Using Spatial Features for Meetings.

Interspeech2022 Yi Lei, Shan Yang, Jian Cong, Lei Xie 0001, Dan Su 0002
Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice Conversion.

Interspeech2022 Xiaoyi Qin, Na Li 0012, Chao Weng, Dan Su 0002, Ming Li 0026, 
Cross-Age Speaker Verification: Learning Age-Invariant Speaker Embeddings.

Interspeech2022 Liumeng Xue, Shan Yang, Na Hu, Dan Su 0002, Lei Xie 0001, 
Learning Noise-independent Speech Representation for High-quality Voice Conversion for Noisy Target Speakers.

Interspeech2022 Yixuan Zhou, Changhe Song, Jingbei Li, Zhiyong Wu 0003, Yanyao Bian, Dan Su 0002, Helen Meng, 
Enhancing Word-Level Semantic Representation via Dependency Structure for Expressive Text-to-Speech Synthesis.

Interspeech2022 Yixuan Zhou, Changhe Song, Xiang Li, Luwen Zhang, Zhiyong Wu 0001, Yanyao Bian, Dan Su 0002, Helen Meng, 
Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synthesis.

ICLR2022 Max W. Y. Lam, Jun Wang 0091, Dan Su 0002, Dong Yu 0001, 
BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis.

IJCAI2022 Rongjie Huang, Max W. Y. Lam, Jun Wang 0091, Dan Su 0002, Dong Yu 0001, Yi Ren 0006, Zhou Zhao, 
FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis.

ICASSP2021 Liqiang He, Dan Su 0002, Dong Yu 0001, 
Learned Transferable Architectures Can Surpass Hand-Designed Architectures for Large Scale Speech Recognition.

ICASSP2021 Max W. Y. Lam, Jun Wang 0091, Dan Su 0002, Dong Yu 0001, 
Sandglasset: A Light Multi-Granularity Self-Attentive Network for Time-Domain Speech Separation.

ICASSP2021 Xu Li, Na Li 0012, Chao Weng, Xunying Liu, Dan Su 0002, Dong Yu 0001, Helen Meng, 
Replay and Synthetic Speech Detection with Res2Net Architecture.

ICASSP2021 Xingchen Song, Zhiyong Wu 0001, Yiheng Huang, Chao Weng, Dan Su 0002, Helen M. Meng, 
Non-Autoregressive Transformer ASR with CTC-Enhanced Decoder Input.

ICASSP2021 Naijun Zheng, Na Li 0012, Bo Wu, Meng Yu 0003, Jianwei Yu, Chao Weng, Dan Su 0002, Xunying Liu, Helen Meng, 
A Joint Training Framework of Multi-Look Separator and Speaker Embedding Extractor for Overlapped Speech.

Interspeech2021 Guoguo Chen, Shuzhou Chai, Guan-Bo Wang, Jiayu Du, Wei-Qiang Zhang, Chao Weng, Dan Su 0002, Daniel Povey, Jan Trmal, Junbo Zhang, Mingjie Jin, Sanjeev Khudanpur, Shinji Watanabe 0001, Shuaijiang Zhao, Wei Zou, Xiangang Li, Xuchen Yao, Yongqing Wang, Zhao You, Zhiyong Yan, 
GigaSpeech: An Evolving, Multi-Domain ASR Corpus with 10, 000 Hours of Transcribed Audio.

Interspeech2021 Jian Cong, Shan Yang, Na Hu, Guangzhi Li, Lei Xie 0001, Dan Su 0002
Controllable Context-Aware Conversational Speech Synthesis.

#40  | Yu Zhang 0033 | Google Scholar   DBLP
VenuesInterspeech: 26ICASSP: 23ICLR: 2ICML: 1NeurIPS: 1
Years2022: 112021: 152020: 92019: 82018: 32017: 22016: 5
ISCA Sectionspeech synthesis: 6spoken language processing: 2search/decoding techniques and confidence measures for asr: 2asr neural network architectures and training: 2training strategies for asr: 2resource-constrained asr: 1novel models and training methods for asr: 1adaptation, transfer learning, and distillation for asr: 1self-supervised, semi-supervised, adaptation and data augmentation for asr: 1self-supervision and semi-supervision for neural asr training: 1non-autoregressive sequential modeling for speech processing: 1multi- and cross-lingual asr, other topics in asr: 1asr neural network architectures: 1asr neural network training: 1neural network acoustic models for asr: 1voice conversion: 1new trends in neural networks for speech recognition: 1
IEEE Keywordspeech recognition: 17speech synthesis: 8natural language processing: 6speech coding: 5recurrent neural nets: 5text analysis: 3speaker recognition: 3conformer: 3data augmentation: 3text to speech: 3probability: 2end to end: 2confidence scores: 2automatic speech recognition: 2end to end asr: 2transformer: 2rnn t: 2end to end speech recognition: 2tacotron 2: 2multilingual: 2consistency regularization: 1self supervised: 1estimation theory: 1out of domain: 1feature selection: 1rnnt: 1two pass asr: 1long form asr: 1emotion recognition: 1supervised learning: 1representation learning: 1pattern classification: 1speech: 1paralinguistics: 1self supervised learning: 1non streaming asr: 1model distillation: 1streaming asr: 1iterative methods: 1self attention: 1vae: 1non autoregressive: 1autoregressive processes: 1computational complexity: 1neural tts: 1latency: 1cascaded encoders: 1hidden markov models: 1calibration: 1mean square error methods: 1voice activity detection: 1attention based end to end models: 1confidence: 1echo state network: 1decoding: 1echo: 1long form: 1multi domain training: 1optimisation: 1vocabulary: 1fine grained vae: 1regression analysis: 1hierarchical: 1tacotron: 1data efficiency: 1semi supervised learning: 1pre training: 1unpaired data: 1expert systems: 1cycle consistency: 1variational autoencoder: 1adversarial training: 1text to speech synthesis: 1end to end speech synthesis: 1neural net architecture: 1wavenet: 1vocoders: 1waveform analysis: 1data selection: 1bottleneck features: 1dnn: 1natural languages: 1multi task learning: 1microphones: 1signal classification: 1deep neural network: 1factor representation: 1integrated adaptation: 1far field speech recognition: 1signal representation: 1i vector: 1lstm rnns: 1speaker adaptation: 1speaking rate: 1speaker aware training: 1feedforward neural nets: 1highway lstm: 1lstm: 1cntk: 1sequence training: 1
Most Publications2022: 362021: 352020: 302019: 212017: 15

Affiliations
Google
Massachusetts Institute of Technology, Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, USA (PhD 2017)
URLs

ICASSP2022 Zhehuai Chen, Yu Zhang 0033, Andrew Rosenberg, Bhuvana Ramabhadran, Pedro J. Moreno 0001, Gary Wang, 
Tts4pretrain 2.0: Advancing the use of Text and Speech in ASR Pretraining with Consistency and Contrastive Losses.

ICASSP2022 Qiujia Li, Yu Zhang 0033, David Qiu, Yanzhang He, Liangliang Cao, Philip C. Woodland, 
Improving Confidence Estimation on Out-of-Domain Data for End-to-End Speech Recognition.

ICASSP2022 Tara N. Sainath, Yanzhang He, Arun Narayanan, Rami Botros, Weiran Wang, David Qiu, Chung-Cheng Chiu, Rohit Prabhavalkar, Alexander Gruenstein, Anmol Gulati, Bo Li 0028, David Rybach, Emmanuel Guzman, Ian McGraw, James Qin, Krzysztof Choromanski, Qiao Liang, Robert David, Ruoming Pang, Shuo-Yiin Chang, Trevor Strohman, W. Ronny Huang, Wei Han 0002, Yonghui Wu, Yu Zhang 0033
Improving The Latency And Quality Of Cascaded Encoders.

ICASSP2022 Joel Shor, Aren Jansen, Wei Han 0002, Daniel S. Park, Yu Zhang 0033
Universal Paralinguistic Speech Representations Using self-Supervised Conformers.

Interspeech2022 Murali Karthick Baskar, Andrew Rosenberg, Bhuvana Ramabhadran, Yu Zhang 0033, Nicolás Serrano, 
Reducing Domain mismatch in Self-supervised speech pre-training.

Interspeech2022 Zhehuai Chen, Yu Zhang 0033, Andrew Rosenberg, Bhuvana Ramabhadran, Pedro J. Moreno 0001, Ankur Bapna, Heiga Zen, 
MAESTRO: Matched Speech Text Representations through Modality Matching.

Interspeech2022 Alexis Conneau, Ankur Bapna, Yu Zhang 0033, Min Ma, Patrick von Platen, Anton Lozhkov, Colin Cherry, Ye Jia, Clara Rivera, Mihir Kale, Daan van Esch, Vera Axelrod, Simran Khanuja, Jonathan H. Clark, Orhan Firat, Michael Auli, Sebastian Ruder, Jason Riesa, Melvin Johnson, 
XTREME-S: Evaluating Cross-lingual Speech Representations.

Interspeech2022 Lev Finkelstein, Heiga Zen, Norman Casagrande, Chun-an Chan, Ye Jia, Tom Kenter, Alexey Petelin, Jonathan Shen, Vincent Wan, Yu Zhang 0033, Yonghui Wu, Rob Clark, 
Training Text-To-Speech Systems From Synthetic Data: A Practical Approach For Accent Transfer Tasks.

Interspeech2022 Kuan-Po Huang, Yu-Kuan Fu, Yu Zhang 0033, Hung-yi Lee, 
Improving Distortion Robustness of Self-supervised Speech Processing Tasks with Domain Adaptation.

Interspeech2022 Ye Jia, Yifan Ding, Ankur Bapna, Colin Cherry, Yu Zhang 0033, Alexis Conneau, Nobu Morioka, 
Leveraging unsupervised and weakly-supervised data to improve direct speech-to-speech translation.

Interspeech2022 Zhiyun Lu, Yongqiang Wang, Yu Zhang 0033, Wei Han, Zhehuai Chen, Parisa Haghani, 
Unsupervised Data Selection via Discrete Speech Representation for ASR.

ICASSP2021 Thibault Doutre, Wei Han 0002, Min Ma, Zhiyun Lu, Chung-Cheng Chiu, Ruoming Pang, Arun Narayanan, Ananya Misra, Yu Zhang 0033, Liangliang Cao, 
Improving Streaming Automatic Speech Recognition with Non-Streaming Model Distillation on Unsupervised Data.

ICASSP2021 Isaac Elias, Heiga Zen, Jonathan Shen, Yu Zhang 0033, Ye Jia, Ron J. Weiss, Yonghui Wu, 
Parallel Tacotron: Non-Autoregressive and Controllable TTS.

ICASSP2021 Bo Li 0028, Anmol Gulati, Jiahui Yu, Tara N. Sainath, Chung-Cheng Chiu, Arun Narayanan, Shuo-Yiin Chang, Ruoming Pang, Yanzhang He, James Qin, Wei Han 0002, Qiao Liang, Yu Zhang 0033, Trevor Strohman, Yonghui Wu, 
A Better and Faster end-to-end Model for Streaming ASR.

ICASSP2021 Qiujia Li, David Qiu, Yu Zhang 0033, Bo Li 0028, Yanzhang He, Philip C. Woodland, Liangliang Cao, Trevor Strohman, 
Confidence Estimation for Attention-Based Sequence-to-Sequence Models for Speech Recognition.

ICASSP2021 David Qiu, Qiujia Li, Yanzhang He, Yu Zhang 0033, Bo Li 0028, Liangliang Cao, Rohit Prabhavalkar, Deepti Bhatia, Wei Li 0133, Ke Hu, Tara N. Sainath, Ian McGraw, 
Learning Word-Level Confidence for Subword End-To-End ASR.

ICASSP2021 Harsh Shrivastava 0001, Ankush Garg, Yuan Cao 0007, Yu Zhang 0033, Tara N. Sainath, 
Echo State Speech Recognition.

Interspeech2021 Zhehuai Chen, Andrew Rosenberg, Yu Zhang 0033, Heiga Zen, Mohammadreza Ghodsi, Yinghui Huang, Jesse Emond, Gary Wang, Bhuvana Ramabhadran, Pedro J. Moreno 0001, 
Semi-Supervision in ASR: Sequential MixMatch and Factorized TTS-Based Augmentation.

Interspeech2021 Nanxin Chen, Yu Zhang 0033, Heiga Zen, Ron J. Weiss, Mohammad Norouzi 0002, Najim Dehak, William Chan, 
WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis.

Interspeech2021 Isaac Elias, Heiga Zen, Jonathan Shen, Yu Zhang 0033, Ye Jia, R. J. Skerry-Ryan, Yonghui Wu, 
Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling.

#41  | Hermann Ney | Google Scholar   DBLP
VenuesInterspeech: 32ICASSP: 18SpeechComm: 1
Years2022: 82021: 62020: 132019: 92018: 72017: 52016: 3
ISCA Sectionneural networks for language modeling: 3novel models and training methods for asr: 2linguistic components in end-to-end asr: 2search for speech recognition: 2asr neural network training: 2asr: 1neural transducers, streaming asr and novel asr models: 1language modeling and text-based innovations for asr: 1keynote: 1neural network training methods and architectures for asr: 1novel neural network architectures for asr: 1asr neural network architectures and training: 1general topics in speech recognition: 1training strategies for asr: 1model adaptation for asr: 1asr neural network architectures: 1model training for asr: 1corpus annotation and evaluation: 1sequence models for asr: 1asr systems and technologies: 1acoustic model adaptation: 1language modeling: 1end-to-end speech recognition: 1acoustic models for asr: 1neural network acoustic models for asr: 1acoustic modeling with neural networks: 1
IEEE Keywordspeech recognition: 18natural language processing: 8recurrent neural nets: 7hidden markov models: 5language modeling: 5feedforward neural nets: 4decoding: 3lstm: 3switchboard: 2sequence training: 2bayes methods: 2acoustic modeling: 2asr: 2optimisation: 2end to end: 2librispeech: 1trees (mathematics): 1regularization: 1cart free hybrid hmm: 1blstm acoustic model: 1language model integration: 1beam search: 1lattice: 1global normalization: 1hybrid conformer hmm: 1transducer: 1autoregressive processes: 1language model: 1dense prediction: 1vocabulary: 1resnet: 1lace: 1cnn: 1teacher student learning: 1domain robustness: 1self attention: 1transformer: 1sequence discriminative training: 1maximum mutual information: 1entropy: 1data augmentation: 1text analysis: 1speech synthesis: 1audio signal processing: 1speaker recognition: 1layer normalization: 1layer normalized lstm: 1hybrid blstm hmm: 1ted lium release 2: 1specaugment: 1multi dimensional lstm: 12d sequence to sequence model: 1mean square error methods: 1student teacher: 1knowledge distillation: 1feature representation: 1waveform: 1modulation spectrum: 1convolution: 1time signal: 1neural network: 1rnn: 1software package: 1graphics processing units: 1recurrent neural networks: 1multi gpu: 1handwriting recognition: 1software packages: 1statistical distributions: 1convolutional neural networks: 1keyword search: 1computation time: 1parallelization: 1backpropagation: 1gradient methods: 1multi domain: 1log linear: 1lm adaptation: 1interpolation: 1deep feedforward network: 1
Most Publications2012: 522013: 482019: 472011: 442020: 43


ICASSP2022 Tina Raissi, Eugen Beck, Ralf Schlüter, Hermann Ney
Improving Factored Hybrid HMM Acoustic Modeling without State Tying.

ICASSP2022 Nils-Philipp Wynands, Wilfried Michel, Jan Rosendahl, Ralf Schlüter, Hermann Ney
Efficient Sequence Training of Attention Models Using Approximative Recombination.

ICASSP2022 Mohammad Zeineldeen, Jingjing Xu, Christoph Lüscher, Wilfried Michel, Alexander Gerstenberger, Ralf Schlüter, Hermann Ney
Conformer-Based Hybrid ASR System For Switchboard Dataset.

ICASSP2022 Wei Zhou, Zuoyun Zheng, Ralf Schlüter, Hermann Ney
On Language Model Integration for RNN Transducer Based Speech Recognition.

Interspeech2022 Felix Meyer, Wilfried Michel, Mohammad Zeineldeen, Ralf Schlüter, Hermann Ney
Automatic Learning of Subword Dependent Model Scales.

Interspeech2022 Zijian Yang, Yingbo Gao, Alexander Gerstenberger, Jintao Jiang, Ralf Schlüter, Hermann Ney
Self-Normalized Importance Sampling for Neural Language Modeling.

Interspeech2022 Mohammad Zeineldeen, Jingjing Xu, Christoph Lüscher, Ralf Schlüter, Hermann Ney
Improving the Training Recipe for a Robust Conformer-based Hybrid Model.

Interspeech2022 Wei Zhou, Wilfried Michel, Ralf Schlüter, Hermann Ney
Efficient Training of Neural Transducer for Speech Recognition.

Interspeech2021 Yingbo Gao, David Thulke, Alexander Gerstenberger, Khoa Viet Tran, Ralf Schlüter, Hermann Ney
On Sampling-Based Training Criteria for Neural Language Modeling.

Interspeech2021 Hermann Ney
Forty Years of Speech and Language Processing: From Bayes Decision Rule to Deep Learning.

Interspeech2021 Mohammad Zeineldeen, Aleksandr Glushko, Wilfried Michel, Albert Zeyer, Ralf Schlüter, Hermann Ney
Investigating Methods to Improve Language Model Integration for Attention-Based Encoder-Decoder ASR Models.

Interspeech2021 Albert Zeyer, André Merboldt, Wilfried Michel, Ralf Schlüter, Hermann Ney
Librispeech Transducer Model with Internal Language Model Prior Correction.

Interspeech2021 Wei Zhou, Albert Zeyer, André Merboldt, Ralf Schlüter, Hermann Ney
Equivalence of Segmental and Neural Transducer Modeling: A Proof of Concept.

Interspeech2021 Wei Zhou, Mohammad Zeineldeen, Zuoyun Zheng, Ralf Schlüter, Hermann Ney
Acoustic Data-Driven Subword Modeling for End-to-End Speech Recognition.

ICASSP2020 Vitalii Bozheniuk, Albert Zeyer, Ralf Schlüter, Hermann Ney
A Comprehensive Study of Residual CNNS for Acoustic Modeling in ASR.

ICASSP2020 Alexander Gerstenberger, Kazuki Irie, Pavel Golik, Eugen Beck, Hermann Ney
Domain Robust, Fast, and Compact Neural Language Models.

ICASSP2020 Kazuki Irie, Alexander Gerstenberger, Ralf Schlüter, Hermann Ney
How Much Self-Attention Do We Needƒ Trading Attention for Feed-Forward Layers.

ICASSP2020 Wilfried Michel, Ralf Schlüter, Hermann Ney
Frame-Level MMI as A Sequence Discriminative Training Criterion for LVCSR.

ICASSP2020 Nick Rossenbach, Albert Zeyer, Ralf Schlüter, Hermann Ney
Generating Synthetic Audio Data for Attention-Based Speech Recognition Systems.

ICASSP2020 Mohammad Zeineldeen, Albert Zeyer, Ralf Schlüter, Hermann Ney
Layer-Normalized LSTM for Hybrid-Hmm and End-To-End ASR.

#42  | Satoshi Nakamura 0001 | Google Scholar   DBLP
VenuesInterspeech: 33TASLP: 9ICASSP: 6SpeechComm: 3
Years2022: 52021: 62020: 82019: 72018: 62017: 72016: 12
ISCA Sectionspoken machine translation: 4speech synthesis: 2speech translation: 2special session: 2speaking styles and interaction styles: 1self-supervised, semi-supervised, adaptation and data augmentation for asr: 1low-resource speech recognition: 1lm adaptation, lexical units and punctuation: 1general topics in speech recognition: 1neural signals for spoken communication: 1the zero resource speech challenge 2020: 1topics in asr: 1turn management in dialogue: 1search methods for speech recognition: 1speech in the brain: 1the zero resource speech challenge 2019: 1sequence models for asr: 1acoustic model adaptation: 1integrating speech science and technology for clinical applications: 1statistical parametric speech synthesis: 1acoustic models for asr: 1speech synthesis prosody: 1cognition and brain studies: 1speech translation and metadata for linguistic/discourse structure: 1automatic learning of representations: 1co-inference of production and acoustics: 1low resource speech recognition: 1
IEEE Keywordspeech synthesis: 7speech recognition: 6gaussian processes: 3natural language processing: 3recurrent neural nets: 2zerospeech: 2unsupervised phoneme discovery: 2dpgmm: 2unsupervised learning: 2speech chain: 2asr: 2signal reconstruction: 2tts: 2blind source separation: 2regression analysis: 2hidden markov models: 2speech enhancement: 2lombard effect: 1speech intelligibility: 1machine speech chain inference: 1acoustic noise: 1text to speech: 1dynamic adaptation: 1signal denoising: 1low resource asr: 1hearing: 1infant speech perception: 1engrams: 1functional load: 1rnn: 1perception of phonemes: 1automatic speech recognition: 1transformer: 1video signal processing: 1ctc: 1hybrid asr: 1emotion recognition: 1affective computing: 1chat based dialogue system: 1human computer interaction: 1information retrieval: 1emotion elicitation: 1interactive systems: 1electroencephalography: 1medical signal processing: 1brain: 1speech artifact removal: 1independent component analysis: 1spoken word production: 1tensor decomposition: 1eeg: 1cognition: 1neurophysiology: 1straight through estimator: 1end to end feedback loss: 1dirichlet process: 1mixture of mixtures: 1unsupervised subword modeling: 1monte carlo methods: 1gibbs sampling: 1acoustic unit discovery: 1bayesian nonparametrics: 1markov processes: 1language translation: 1emphasis estimation: 1word level emphasis: 1speech to speech translation: 1emphasis translation: 1intent: 1pattern classification: 1post filter: 1modulation spectrum: 1trees (mathematics): 1smoothing methods: 1gmm based voice conversion: 1clustergen: 1mixture models: 1global variance: 1oversmoothing: 1statistical parametric speech synthesis: 1spectral analysis: 1statistical singing voice conversion: 1waveform analysis: 1spectral differential: 1filtering theory: 1f0 transformation: 1direct waveform modification: 1cross gender conversion: 1nonaudible murmur microphone: 1microphones: 1semi blind source separation: 1interference suppression: 1silent speech communication: 1external noise monitoring: 1noise suppression: 1generative model: 1product of experts: 1f0 prediction: 1electrolaryngeal speech enhancement: 1
Most Publications2018: 542014: 532015: 472020: 452017: 44

Affiliations
Nara Institute of Science and Technology, Ikoma, Japan
ATR Spoken Language Communication Labs, Kyoto, Japan
National Institute of Information and Communications Technology (NICT), Spoken Language Communication Group, Keihanna Science City, Japan
Sharp Corporation, Nara, Japan
Kyoto University, Japan (PhD 1992)

TASLP2022 Sashi Novitasari, Sakriani Sakti, Satoshi Nakamura 0001
A Machine Speech Chain Approach for Dynamically Adaptive Lombard TTS in Static and Dynamic Noise Environments.

TASLP2022 Bin Wu, Sakriani Sakti, Jinsong Zhang 0001, Satoshi Nakamura 0001
Modeling Unsupervised Empirical Adaptation by DPGMM and DPGMM-RNN Hybrid Model to Extract Perceptual Features for Low-Resource ASR.

Interspeech2022 Ryo Fukuda, Katsuhito Sudoh, Satoshi Nakamura 0001
Speech Segmentation Optimization using Segmented Bilingual Speech Corpus for End-to-end Speech Translation.

Interspeech2022 Seiya Kawano, Muteki Arioka, Akishige Yuguchi, Kenta Yamamoto, Koji Inoue, Tatsuya Kawahara, Satoshi Nakamura 0001, Koichiro Yoshino, 
Multimodal Persuasive Dialogue Corpus using Teleoperated Android.

Interspeech2022 Heli Qi, Sashi Novitasari, Sakriani Sakti, Satoshi Nakamura 0001
Improved Consistency Training for Semi-Supervised Sequence-to-Sequence ASR via Speech Chain Reconstruction and Self-Transcribing.

TASLP2021 Bin Wu, Sakriani Sakti, Jinsong Zhang 0001, Satoshi Nakamura 0001
Tackling Perception Bias in Unsupervised Phoneme Discovery Using DPGMM-RNN Hybrid Model and Functional Load.

Interspeech2021 Johanes Effendi, Sakriani Sakti, Satoshi Nakamura 0001
Weakly-Supervised Speech-to-Text Mapping with Visually Connected Non-Parallel Speech-Text Data Using Cyclic Partially-Aligned Transformer.

Interspeech2021 Yuka Ko, Katsuhito Sudoh, Sakriani Sakti, Satoshi Nakamura 0001
ASR Posterior-Based Loss for Multi-Task End-to-End Speech Translation.

Interspeech2021 Sashi Novitasari, Sakriani Sakti, Satoshi Nakamura 0001
Dynamically Adaptive Machine Speech Chain Inference for TTS in Noisy Environment: Listen and Speak Louder.

Interspeech2021 Shun Takahashi, Sakriani Sakti, Satoshi Nakamura 0001
Unsupervised Neural-Based Graph Clustering for Variable-Length Speech Representation Discovery of Zero-Resource Languages.

Interspeech2021 Hirotaka Tokuyama, Sakriani Sakti, Katsuhito Sudoh, Satoshi Nakamura 0001
Transcribing Paralinguistic Acoustic Cues to Target Language Text in Transformer-Based Speech-to-Text Translation.

TASLP2020 Andros Tjandra, Sakriani Sakti, Satoshi Nakamura 0001
Machine Speech Chain.

TASLP2020 Andros Tjandra, Sakriani Sakti, Satoshi Nakamura 0001
Corrections to "Machine Speech Chain".

ICASSP2020 Andros Tjandra, Chunxi Liu, Frank Zhang 0001, Xiaohui Zhang, Yongqiang Wang 0005, Gabriel Synnaeve, Satoshi Nakamura 0001, Geoffrey Zweig, 
DEJA-VU: Double Feature Presentation and Iterated Loss in Deep Transformer Networks.

Interspeech2020 Johanes Effendi, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura 0001
Augmenting Images for ASR and TTS Through Single-Loop and Dual-Loop Multimodal Chain Framework.

Interspeech2020 Sashi Novitasari, Andros Tjandra, Tomoya Yanagita, Sakriani Sakti, Satoshi Nakamura 0001
Incremental Machine Speech Chain Towards Enabling Listening While Speaking in Real-Time.

Interspeech2020 Ivan Halim Parmonangan, Hiroki Tanaka, Sakriani Sakti, Satoshi Nakamura 0001
Combining Audio and Brain Activity for Predicting Speech Quality.

Interspeech2020 Andros Tjandra, Sakriani Sakti, Satoshi Nakamura 0001
Transformer VQ-VAE for Unsupervised Unit Discovery and Speech Synthesis: ZeroSpeech 2020 Challenge.

Interspeech2020 Kazuki Tsunematsu, Johanes Effendi, Sakriani Sakti, Satoshi Nakamura 0001
Neural Speech Completion.

TASLP2019 Nurul Lubis, Sakriani Sakti, Koichiro Yoshino, Satoshi Nakamura 0001
Positive Emotion Elicitation in Chat-Based Dialogue Systems.

#43  | Keisuke Kinoshita | Google Scholar   DBLP
VenuesInterspeech: 29ICASSP: 19TASLP: 2SpeechComm: 1
Years2022: 62021: 112020: 112019: 82018: 52017: 72016: 3
ISCA Sectionsource separation: 3speech enhancement: 3multi-channel speech enhancement: 2dereverberation, noise reduction, and speaker extraction: 1speaker embedding and diarization: 1single-channel speech enhancement: 1speaker diarization: 1source separation, dereverberation and echo cancellation: 1speech localization, enhancement, and quality assessment: 1speech enhancement and intelligibility: 1noise reduction and intelligibility: 1monaural source separation: 1diarization: 1targeted source separation: 1asr for noisy and far-field speech: 1speech and audio source separation and scene analysis: 1distant asr: 1speech intelligibility and quality: 1source separation and auditory scene analysis: 1far-field speech recognition: 1speech-enhancement: 1speech intelligibility: 1acoustic model adaptation: 1speech enhancement and noise reduction: 1
IEEE Keywordspeech recognition: 15speaker recognition: 9speech enhancement: 9source separation: 6blind source separation: 4reverberation: 4neural network: 4array signal processing: 3robust asr: 3backpropagation: 3gaussian processes: 2mixture models: 2speech extraction: 2speech separation: 2convolution: 2convolutional neural nets: 2online processing: 2dynamic stream weights: 2audio signal processing: 2optimisation: 2beamforming: 2dereverberation: 2target speech extraction: 2time domain network: 2time domain analysis: 2hidden markov models: 2joint training: 2adaptation: 2auxiliary feature: 2speaker extraction: 2infinite gmm: 1bayes methods: 1diarization: 1pattern clustering: 1noise robust speech recognition: 1speakerbeam: 1deep learning (artificial intelligence): 1input switching: 1expectation maximization (em) algorithm: 1microphones: 1covariance analysis: 1blind source separation (bss): 1full rank spatial covariance analysis (fca): 1gaussian distribution: 1expectation maximisation algorithm: 1multivariate complex gaussian distribution: 1complex backpropagation: 1transfer functions: 1signal to distortion ratio: 1multi channel source separation: 1acoustic beamforming: 1meeting recognition: 1speaker activity: 1continuous speech separation: 1long recording speech separation: 1dual path modeling: 1transforms: 1audio visual systems: 1audiovisual speaker localization: 1sensor fusion: 1data fusion: 1video signal processing: 1image fusion: 1maximum likelihood estimation: 1automatic speech recognition: 1filtering theory: 1microphone array: 1spatial features: 1multi task loss: 1microphone arrays: 1single channel speech enhancement: 1signal denoising: 1time domain: 1frequency domain analysis: 1multi speaker speech recognition: 1computational complexity: 1end to end speech recognition: 1tracking: 1recurrent neural nets: 1backprop kalman filter: 1audiovisual speaker tracking: 1kalman filters: 1iterative methods: 1joint optimization: 1least squares approximations: 1source counting: 1meeting diarization: 1speech separation/extraction: 1speaker attention: 1acoustic modeling: 1adaptive training: 1deep neural network: 1acoustic model adaptation: 1feedforward neural nets: 1speech mixtures: 1spatial filters: 1speaker adaptive neural network: 1context adaptation: 1spatial diffuseness features: 1cnn based acoustic model: 1environmental robustness: 1inverse problems: 1conditional density: 1model based feature enhancement: 1mixture density network: 1
Most Publications2021: 392020: 282019: 202022: 182017: 16

Affiliations
URLs

ICASSP2022 Keisuke Kinoshita, Marc Delcroix, Tomoharu Iwata, 
Tight Integration Of Neural- And Clustering-Based Diarization Through Deep Unfolding Of Infinite Gaussian Mixture Model.

ICASSP2022 Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Naoyuki Kamo, Takafumi Moriya, 
Learning to Enhance or Not: Neural Network-Based Switching of Enhanced and Observed Signals for Overlapping Speech Recognition.

ICASSP2022 Hiroshi Sawada, Rintaro Ikeshita, Keisuke Kinoshita, Tomohiro Nakatani, 
Multi-Frame Full-Rank Spatial Covariance Analysis for Underdetermined BSS in Reverberant Environments.

Interspeech2022 Marc Delcroix, Keisuke Kinoshita, Tsubasa Ochiai, Katerina Zmolíková, Hiroshi Sato, Tomohiro Nakatani, 
Listen only to me! How well can target speech extraction handle false alarms?

Interspeech2022 Keisuke Kinoshita, Thilo von Neumann, Marc Delcroix, Christoph Böddeker, Reinhold Haeb-Umbach, 
Utterance-by-utterance overlap-aware neural diarization with Graph-PIT.

Interspeech2022 Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Takafumi Moriya, Naoki Makishima, Mana Ihori, Tomohiro Tanaka, Ryo Masumura, 
Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations.

ICASSP2021 Christoph Böddeker, Wangyou Zhang, Tomohiro Nakatani, Keisuke Kinoshita, Tsubasa Ochiai, Marc Delcroix, Naoyuki Kamo, Yanmin Qian, Reinhold Haeb-Umbach, 
Convolutive Transfer Function Invariant SDR Training Criteria for Multi-Channel Reverberant Speech Separation.

ICASSP2021 Marc Delcroix, Katerina Zmolíková, Tsubasa Ochiai, Keisuke Kinoshita, Tomohiro Nakatani, 
Speaker Activity Driven Neural Speech Extraction.

ICASSP2021 Chenda Li, Zhuo Chen 0006, Yi Luo 0004, Cong Han, Tianyan Zhou, Keisuke Kinoshita, Marc Delcroix, Shinji Watanabe 0001, Yanmin Qian, 
Dual-Path Modeling for Long Recording Speech Separation in Meetings.

ICASSP2021 Julio Wissing, Benedikt T. Boenninghoff, Dorothea Kolossa, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Shoko Araki, Christopher Schymura, 
Data Fusion for Audiovisual Speaker Localization: Extending Dynamic Stream Weights to the Spatial Domain.

Interspeech2021 Marc Delcroix, Jorge Bennasar Vázquez, Tsubasa Ochiai, Keisuke Kinoshita, Shoko Araki, 
Few-Shot Learning of New Sound Classes for Target Sound Extraction.

Interspeech2021 Cong Han, Yi Luo 0004, Chenda Li, Tianyan Zhou, Keisuke Kinoshita, Shinji Watanabe 0001, Marc Delcroix, Hakan Erdogan, John R. Hershey, Nima Mesgarani, Zhuo Chen 0006, 
Continuous Speech Separation Using Speaker Inventory for Long Recording.

Interspeech2021 Keisuke Kinoshita, Marc Delcroix, Naohiro Tawara, 
Advances in Integration of End-to-End Neural and Clustering-Based Diarization for Real Conversational Speech.

Interspeech2021 Thilo von Neumann, Keisuke Kinoshita, Christoph Böddeker, Marc Delcroix, Reinhold Haeb-Umbach, 
Graph-PIT: Generalized Permutation Invariant Training for Continuous Separation of Arbitrary Numbers of Speakers.

Interspeech2021 Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Takafumi Moriya, Naoyuki Kamo, 
Should We Always Separate?: Switching Between Enhanced and Observed Signals for Overlapping Speech Recognition.

Interspeech2021 Christopher Schymura, Benedikt T. Bönninghoff, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Shoko Araki, Dorothea Kolossa, 
PILOT: Introducing Transformers for Probabilistic Sound Event Localization.

Interspeech2021 Ayako Yamamoto, Toshio Irino, Kenichi Arai, Shoko Araki, Atsunori Ogawa, Keisuke Kinoshita, Tomohiro Nakatani, 
Comparison of Remote Experiments Using Crowdsourcing and Laboratory Experiments on Speech Intelligibility.

SpeechComm2020 Katsuhiko Yamamoto, Toshio Irino, Shoko Araki, Keisuke Kinoshita, Tomohiro Nakatani, 
GEDI: Gammachirp envelope distortion index for predicting intelligibility of enhanced speech.

TASLP2020 Tomohiro Nakatani, Christoph Böddeker, Keisuke Kinoshita, Rintaro Ikeshita, Marc Delcroix, Reinhold Haeb-Umbach, 
Jointly Optimal Denoising, Dereverberation, and Source Separation.

ICASSP2020 Marc Delcroix, Tsubasa Ochiai, Katerina Zmolíková, Keisuke Kinoshita, Naohiro Tawara, Tomohiro Nakatani, Shoko Araki, 
Improving Speaker Discrimination of Target Speech Extraction With Time-Domain Speakerbeam.

#44  | Ralf Schlüter | Google Scholar   DBLP
VenuesInterspeech: 33ICASSP: 17SpeechComm: 1
Years2022: 82021: 62020: 122019: 102018: 72017: 52016: 3
ISCA Sectionneural networks for language modeling: 3novel models and training methods for asr: 2linguistic components in end-to-end asr: 2search for speech recognition: 2asr neural network training: 2end-to-end speech recognition: 2asr: 1neural transducers, streaming asr and novel asr models: 1language modeling and text-based innovations for asr: 1applications in transcription, education and learning: 1neural network training methods and architectures for asr: 1novel neural network architectures for asr: 1asr neural network architectures and training: 1general topics in speech recognition: 1training strategies for asr: 1model adaptation for asr: 1asr neural network architectures: 1model training for asr: 1corpus annotation and evaluation: 1sequence models for asr: 1asr systems and technologies: 1acoustic model adaptation: 1language modeling: 1acoustic models for asr: 1neural network acoustic models for asr: 1acoustic modeling with neural networks: 1
IEEE Keywordspeech recognition: 17natural language processing: 7recurrent neural nets: 7hidden markov models: 5language modeling: 4feedforward neural nets: 4decoding: 3lstm: 3switchboard: 2sequence training: 2bayes methods: 2acoustic modeling: 2optimisation: 2end to end: 2librispeech: 1trees (mathematics): 1regularization: 1cart free hybrid hmm: 1blstm acoustic model: 1language model integration: 1beam search: 1lattice: 1global normalization: 1hybrid conformer hmm: 1transducer: 1autoregressive processes: 1language model: 1dense prediction: 1vocabulary: 1resnet: 1lace: 1cnn: 1self attention: 1transformer: 1sequence discriminative training: 1maximum mutual information: 1entropy: 1data augmentation: 1text analysis: 1speech synthesis: 1audio signal processing: 1speaker recognition: 1layer normalization: 1layer normalized lstm: 1hybrid blstm hmm: 1ted lium release 2: 1specaugment: 1multi dimensional lstm: 12d sequence to sequence model: 1mean square error methods: 1student teacher: 1knowledge distillation: 1asr: 1feature representation: 1waveform: 1modulation spectrum: 1convolution: 1time signal: 1neural network: 1rnn: 1software package: 1graphics processing units: 1recurrent neural networks: 1multi gpu: 1handwriting recognition: 1software packages: 1statistical distributions: 1convolutional neural networks: 1keyword search: 1computation time: 1parallelization: 1backpropagation: 1gradient methods: 1multi domain: 1log linear: 1lm adaptation: 1interpolation: 1deep feedforward network: 1
Most Publications2021: 272019: 242020: 232013: 212011: 21


ICASSP2022 Tina Raissi, Eugen Beck, Ralf Schlüter, Hermann Ney, 
Improving Factored Hybrid HMM Acoustic Modeling without State Tying.

ICASSP2022 Nils-Philipp Wynands, Wilfried Michel, Jan Rosendahl, Ralf Schlüter, Hermann Ney, 
Efficient Sequence Training of Attention Models Using Approximative Recombination.

ICASSP2022 Mohammad Zeineldeen, Jingjing Xu, Christoph Lüscher, Wilfried Michel, Alexander Gerstenberger, Ralf Schlüter, Hermann Ney, 
Conformer-Based Hybrid ASR System For Switchboard Dataset.

ICASSP2022 Wei Zhou, Zuoyun Zheng, Ralf Schlüter, Hermann Ney, 
On Language Model Integration for RNN Transducer Based Speech Recognition.

Interspeech2022 Felix Meyer, Wilfried Michel, Mohammad Zeineldeen, Ralf Schlüter, Hermann Ney, 
Automatic Learning of Subword Dependent Model Scales.

Interspeech2022 Zijian Yang, Yingbo Gao, Alexander Gerstenberger, Jintao Jiang, Ralf Schlüter, Hermann Ney, 
Self-Normalized Importance Sampling for Neural Language Modeling.

Interspeech2022 Mohammad Zeineldeen, Jingjing Xu, Christoph Lüscher, Ralf Schlüter, Hermann Ney, 
Improving the Training Recipe for a Robust Conformer-based Hybrid Model.

Interspeech2022 Wei Zhou, Wilfried Michel, Ralf Schlüter, Hermann Ney, 
Efficient Training of Neural Transducer for Speech Recognition.

Interspeech2021 Yingbo Gao, David Thulke, Alexander Gerstenberger, Khoa Viet Tran, Ralf Schlüter, Hermann Ney, 
On Sampling-Based Training Criteria for Neural Language Modeling.

Interspeech2021 Yu Qiao 0005, Wei Zhou, Elma Kerz, Ralf Schlüter
The Impact of ASR on the Automatic Analysis of Linguistic Complexity and Sophistication in Spontaneous L2 Speech.

Interspeech2021 Mohammad Zeineldeen, Aleksandr Glushko, Wilfried Michel, Albert Zeyer, Ralf Schlüter, Hermann Ney, 
Investigating Methods to Improve Language Model Integration for Attention-Based Encoder-Decoder ASR Models.

Interspeech2021 Albert Zeyer, André Merboldt, Wilfried Michel, Ralf Schlüter, Hermann Ney, 
Librispeech Transducer Model with Internal Language Model Prior Correction.

Interspeech2021 Wei Zhou, Albert Zeyer, André Merboldt, Ralf Schlüter, Hermann Ney, 
Equivalence of Segmental and Neural Transducer Modeling: A Proof of Concept.

Interspeech2021 Wei Zhou, Mohammad Zeineldeen, Zuoyun Zheng, Ralf Schlüter, Hermann Ney, 
Acoustic Data-Driven Subword Modeling for End-to-End Speech Recognition.

ICASSP2020 Vitalii Bozheniuk, Albert Zeyer, Ralf Schlüter, Hermann Ney, 
A Comprehensive Study of Residual CNNS for Acoustic Modeling in ASR.

ICASSP2020 Kazuki Irie, Alexander Gerstenberger, Ralf Schlüter, Hermann Ney, 
How Much Self-Attention Do We Needƒ Trading Attention for Feed-Forward Layers.

ICASSP2020 Wilfried Michel, Ralf Schlüter, Hermann Ney, 
Frame-Level MMI as A Sequence Discriminative Training Criterion for LVCSR.

ICASSP2020 Nick Rossenbach, Albert Zeyer, Ralf Schlüter, Hermann Ney, 
Generating Synthetic Audio Data for Attention-Based Speech Recognition Systems.

ICASSP2020 Mohammad Zeineldeen, Albert Zeyer, Ralf Schlüter, Hermann Ney, 
Layer-Normalized LSTM for Hybrid-Hmm and End-To-End ASR.

ICASSP2020 Wei Zhou, Wilfried Michel, Kazuki Irie, Markus Kitza, Ralf Schlüter, Hermann Ney, 
The Rwth Asr System for Ted-Lium Release 2: Improving Hybrid Hmm With Specaugment.

#45  | Daniel Povey | Google Scholar   DBLP
VenuesInterspeech: 36ICASSP: 13TASLP: 1
Years2022: 12021: 52020: 82019: 92018: 162017: 82016: 3
ISCA Sectionspeaker recognition evaluation: 3tools, corpora and resources: 2the voices from a distance challenge: 2acoustic models for asr: 2neural transducers, streaming asr and novel asr models: 1feature extraction and distant asr: 1lm adaptation, lexical units and punctuation: 1neural networks for language modeling: 1multilingual and code-switched asr: 1asr neural network architectures and training: 1summarization, semantic analysis and classification: 1representation learning of emotion and paralinguistics: 1spoken language processing for children’s speech: 1speaker recognition and diarization: 1recurrent neural models for asr: 1novel neural network architectures for acoustic modelling: 1robust speech recognition: 1speaker state and trait: 1end-to-end speech recognition: 1language modeling: 1acoustic modelling: 1representation learning for emotion: 1the first dihard speech diarization challenge: 1speaker verification using neural network methods: 1search, computational strategies and language modeling: 1speaker recognition: 1spoken term detection: 1lexical and pronunciation modeling: 1acoustic modeling with neural networks: 1far-field speech processing: 1topics in speech recognition: 1
IEEE Keywordspeech recognition: 10automatic speech recognition: 8natural language processing: 4decoding: 3transformer: 3speaker recognition: 3deep neural networks: 3decoder: 2lattice rescoring: 2speaker diarization: 2x vectors: 2lattice free mmi: 2recurrent neural nets: 2speech coding: 1lattice pruning: 1lattice generation: 1parallel computation: 1neural language models: 1parallel processing: 1convolutional neural nets: 1voice activity detection: 1lf mmi: 1streaming: 1computational complexity: 1wake word detection: 1gradient methods: 1parallel computing: 1optimisation: 1edge: 1graphics processing units: 1multiprocessing systems: 1wfst: 1language model adaptation: 1linear interpolation: 1neural language model: 1interpolation: 1merging: 1hidden markov models: 1flat start: 1lattice free: 1maximum mutual information: 1single stage: 1semi supervised training: 1sequence training: 1asr: 1neural network: 1lstm: 1attention: 1data augmentation: 1recurrent neural network language model: 1approximation theory: 1heuristic search: 1probability: 1importance sampling: 1vocabulary: 1recurrent neural networks: 1language modeling: 1signal representation: 1clustering: 1end to end learning: 1pattern clustering: 1augmentation: 1room impulse responses: 1deep neural network: 1reverberation: 1
Most Publications2018: 222020: 162019: 152021: 132015: 13


Interspeech2022 Fangjun Kuang, Liyong Guo, Wei Kang 0006, Long Lin, Mingshuang Luo, Zengwei Yao, Daniel Povey
Pruned RNN-T for fast, memory-efficient ASR training.

ICASSP2021 Hang Lv 0001, Zhehuai Chen, Hainan Xu, Daniel Povey, Lei Xie 0001, Sanjeev Khudanpur, 
An Asynchronous WFST-Based Decoder for Automatic Speech Recognition.

ICASSP2021 Ke Li 0018, Daniel Povey, Sanjeev Khudanpur, 
A Parallelizable Lattice Rescoring Strategy with Neural Language Models.

ICASSP2021 Yiming Wang, Hang Lv 0001, Daniel Povey, Lei Xie 0001, Sanjeev Khudanpur, 
Wake Word Detection with Streaming Transformers.

Interspeech2021 Guoguo Chen, Shuzhou Chai, Guan-Bo Wang, Jiayu Du, Wei-Qiang Zhang, Chao Weng, Dan Su 0002, Daniel Povey, Jan Trmal, Junbo Zhang, Mingjie Jin, Sanjeev Khudanpur, Shinji Watanabe 0001, Shuaijiang Zhao, Wei Zou, Xiangang Li, Xuchen Yao, Yongqing Wang, Zhao You, Zhiyong Yan, 
GigaSpeech: An Evolving, Multi-Domain ASR Corpus with 10, 000 Hours of Transcribed Audio.

Interspeech2021 Junbo Zhang, Zhiwen Zhang, Yongqing Wang, Zhiyong Yan, Qiong Song, Yukai Huang, Ke Li 0018, Daniel Povey, Yujun Wang, 
speechocean762: An Open-Source Non-Native English Speech Corpus for Pronunciation Assessment.

ICASSP2020 Hugo Braun, Justin Luitjens, Ryan Leary, Tim Kaldewey, Daniel Povey
Gpu-Accelerated Viterbi Exact Lattice Decoder for Batched Online and Offline Speech Recognition.

ICASSP2020 Ke Li 0018, Zhe Liu 0011, Tianxing He, Hongzhao Huang, Fuchun Peng, Daniel Povey, Sanjeev Khudanpur, 
An Empirical Study of Transformer-Based Neural Language Model Adaptation.

Interspeech2020 Pegah Ghahramani, Hossein Hadian, Daniel Povey, Hynek Hermansky, Sanjeev Khudanpur, 
An Alternative to MFCCs for ASR.

Interspeech2020 Ruizhe Huang, Ke Li 0018, Ashish Arora, Daniel Povey, Sanjeev Khudanpur, 
Efficient MDI Adaptation for n-Gram Language Models.

Interspeech2020 Ke Li 0018, Daniel Povey, Sanjeev Khudanpur, 
Neural Language Modeling with Implicit Cache Pointers.

Interspeech2020 Srikanth R. Madikeri, Banriskhem K. Khonglah, Sibo Tong, Petr Motlícek, Hervé Bourlard, Daniel Povey
Lattice-Free Maximum Mutual Information Training of Multilingual Speech Recognition Systems.

Interspeech2020 Yiwen Shao, Yiming Wang, Daniel Povey, Sanjeev Khudanpur, 
PyChain: A Fully Parallelized PyTorch Implementation of LF-MMI for End-to-End ASR.

Interspeech2020 Yiming Wang, Hang Lv 0001, Daniel Povey, Lei Xie 0001, Sanjeev Khudanpur, 
Wake Word Detection with Alignment-Free Lattice-Free MMI.

ICASSP2019 David Snyder, Daniel Garcia-Romero, Gregory Sell, Alan McCree, Daniel Povey, Sanjeev Khudanpur, 
Speaker Recognition for Multi-speaker Conversations Using X-vectors.

Interspeech2019 Daniel Garcia-Romero, David Snyder, Gregory Sell, Alan McCree, Daniel Povey, Sanjeev Khudanpur, 
x-Vector DNN Refinement with Full-Length Recordings for Speaker Recognition.

Interspeech2019 Daniel Garcia-Romero, David Snyder, Shinji Watanabe 0001, Gregory Sell, Alan McCree, Daniel Povey, Sanjeev Khudanpur, 
Speaker Recognition Benchmark Using the CHiME-5 Corpus.

Interspeech2019 Mousmita Sarma, Pegah Ghahremani, Daniel Povey, Nagendra Kumar Goel, Kandarpa Kumar Sarma, Najim Dehak, 
Improving Emotion Identification Using Phone Posteriors in Raw Speech Waveform Based DNN.

Interspeech2019 David Snyder, Jesús Villalba 0001, Nanxin Chen, Daniel Povey, Gregory Sell, Najim Dehak, Sanjeev Khudanpur, 
The JHU Speaker Recognition System for the VOiCES 2019 Challenge.

Interspeech2019 Jesús Villalba 0001, Nanxin Chen, David Snyder, Daniel Garcia-Romero, Alan McCree, Gregory Sell, Jonas Borgstrom, Fred Richardson, Suwon Shon, François Grondin, Réda Dehak, Leibny Paola García-Perera, Daniel Povey, Pedro A. Torres-Carrasquillo, Sanjeev Khudanpur, Najim Dehak, 
State-of-the-Art Speaker Recognition for Telephone and Video Speech: The JHU-MIT Submission for NIST SRE18.

#46  | Tan Lee | Google Scholar   DBLP
VenuesInterspeech: 35ICASSP: 10TASLP: 3SpeechComm: 1
Years2022: 102021: 52020: 102019: 122018: 62017: 32016: 3
ISCA Sectionspeech synthesis: 5speech emotion recognition: 3embedding and network architecture for speaker recognition: 2multimodal speech emotion recognition and paralinguistics: 2phonetics: 1speaker and language recognition: 1atypical speech detection: 1assessment of pathological speech and language: 1voice anti-spoofing and countermeasure: 1dnn architectures for speaker recognition: 1language learning: 1speech, language, and multimodal resources: 1zero-resource asr: 1the zero resource speech challenge 2019: 1network architectures for emotion and paralinguistics recognition: 1speech and language analytics for medical applications: 1speech and voice disorders: 1model adaptation for asr: 1adjusting to speaker, accent, and domain: 1zero-resource speech recognition: 1deception, personality, and culture attribute: 1speech pathology, depression, and medical applications: 1special session: 1speech recognition: 1acoustic model adaptation: 1acoustic modeling with neural networks: 1speech and language processing for clinical health applications: 1
IEEE Keywordspeech recognition: 7speaker recognition: 4signal classification: 3gaussian processes: 3text analysis: 2bayesian learning: 2bayes methods: 2lhuc: 2computational linguistics: 2convolutional neural nets: 2speaker adaptation: 2unsupervised learning: 2natural language processing: 2hidden markov models: 2asr: 2aphasia: 2speech assessment: 2pre training: 1text to speech: 1speech synthesis: 1data reduction: 1tdnn: 1adaptation: 1switchboard: 1biometrics: 1electroencephalography: 1medical signal processing: 1biometrics (access control): 1neurophysiology: 1connectivity: 1multivariate empirical mode decomposition: 1resting state eeg: 1hilbert transforms: 1speech coding: 1signal representation: 1signal reconstruction: 1mixture factorized auto encoder: 1speaker verification: 1unsupervised subword modeling: 1unsupervised deep factorization: 1matrix decomposition: 1feature decomposition: 1convolutional neural network: 1time frequency analysis: 1median filtering: 1acoustic scene classification: 1wavelet transforms: 1sound duration: 1audio signal processing: 1median filters: 1multi task learning: 1robust features: 1zero resource: 1continuous speech: 1voice assessment: 1probability: 1posterior features: 1dnn based asr system: 1acoustic features: 1emotion recognition: 1subspace based gmm: 1speech emotion recognition: 1hybrid dnn hmm: 1adversarial learning: 1domain mismatch: 1unsupervised adaptation: 1language recognition: 1pattern clustering: 1phone posteriorgrams: 1cnn: 1maximum likelihood estimation: 1brain: 1speech: 1cantonese: 1word embedding: 1pathological speech: 1acoustical analysis: 1objective assessment: 1automatic speech recognition: 1
Most Publications2022: 282020: 242021: 232019: 192010: 15

Affiliations
URLs

ICASSP2022 Guangyan Zhang, Yichong Leng, Daxin Tan, Ying Qin, Kaitao Song, Xu Tan 0003, Sheng Zhao, Tan Lee
A Study on the Efficacy of Model Pre-Training In Developing Neural Text-to-Speech System.

Interspeech2022 Jonathan Him Nok Lee, Dehua Tao, Harold Chui, Tan Lee, Sarah Luk, Nicolette Wing Tung Lee, Koonkan Fung, 
Durational Patterning at Discourse Boundaries in Relation to Therapist Empathy in Psychotherapy.

Interspeech2022 Jingyu Li, Wei Liu, Tan Lee
EDITnet: A Lightweight Network for Unsupervised Domain Adaptation in Speaker Verification.

Interspeech2022 Si Ioi Ng, Cymie Wing-Yee Ng, Jiarui Wang, Tan Lee
Automatic Detection of Speech Sound Disorder in Child Speech Using Posterior-based Speaker Representations.

Interspeech2022 Zhiyuan Peng, Xuanji He, Ke Ding, Tan Lee, Guanglu Wan, 
Unifying Cosine and PLDA Back-ends for Speaker Verification.

Interspeech2022 Daxin Tan, Guangyan Zhang, Tan Lee
Environment Aware Text-to-Speech Synthesis.

Interspeech2022 Dehua Tao, Tan Lee, Harold Chui, Sarah Luk, 
Characterizing Therapist's Speaking Style in Relation to Empathy in Psychotherapy.

Interspeech2022 Dehua Tao, Tan Lee, Harold Chui, Sarah Luk, 
Hierarchical Attention Network for Evaluating Therapist Empathy in Counseling Session.

Interspeech2022 Yusheng Tian, Jingyu Li, Tan Lee
Transport-Oriented Feature Aggregation for Speaker Embedding Learning.

Interspeech2022 Guangyan Zhang, Kaitao Song, Xu Tan 0003, Daxin Tan, Yuzi Yan, Yanqing Liu, Gang Wang, Wei Zhou, Tao Qin, Tan Lee, Sheng Zhao, 
Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech.

TASLP2021 Xurong Xie, Xunying Liu, Tan Lee, Lan Wang, 
Bayesian Learning for Deep Neural Network Adaptation.

Interspeech2021 Si Ioi Ng, Cymie Wing-Yee Ng, Jingyu Li, Tan Lee
Detection of Consonant Errors in Disordered Speech Based on Consonant-Vowel Segment Embedding.

Interspeech2021 Zhiyuan Peng, Xu Li, Tan Lee
Pairing Weak with Strong: Twin Models for Defending Against Adversarial Attack on Speaker Verification.

Interspeech2021 Daxin Tan, Tan Lee
Fine-Grained Style Modeling, Transfer and Prediction in Text-to-Speech Synthesis via Phone-Level Content-Style Disentanglement.

Interspeech2021 Guangyan Zhang, Ying Qin, Daxin Tan, Tan Lee
Applying the Information Bottleneck Principle to Prosodic Representation Learning.

ICASSP2020 Matthew King-Hang Ma, Tan Lee, Manson Cheuk-Man Fong, William Shi-Yuan Wang, 
Resting-State EEG-Based Biometrics with Signals Features Extracted by Multivariate Empirical Mode Decomposition.

ICASSP2020 Zhiyuan Peng, Siyuan Feng 0001, Tan Lee
Mixture Factorized Auto-Encoder for Unsupervised Hierarchical Deep Factorization of Speech Signal.

ICASSP2020 Yuzhong Wu, Tan Lee
Time-Frequency Feature Decomposition Based on Sound Duration for Acoustic Scene Classification.

Interspeech2020 Jingyu Li, Tan Lee
Text-Independent Speaker Verification with Dual Attention Network.

Interspeech2020 Shuiyang Mao, P. C. Ching, C.-C. Jay Kuo, Tan Lee
Advancing Multiple Instance Learning with Attention Modeling for Categorical Speech Emotion Recognition.

#47  | S. R. Mahadeva Prasanna | Google Scholar   DBLP
VenuesInterspeech: 33SpeechComm: 8TASLP: 6ICASSP: 2
Years2022: 22021: 22020: 62019: 62018: 142017: 132016: 6
ISCA Sectionspeech and voice disorders: 3acoustic analysis-synthesis of speech disorders: 3voice, speech and hearing disorders: 2special session: 2speaker and language recognition: 1speech type classification and diagnosis: 1language and accent recognition: 1speech in health: 1phonetic event detection and segmentation: 1speech and speaker recognition: 1show and tell 6: 1show and tell 7: 1speech recognition for indian languages: 1speech segments and voice quality: 1spoofing detection: 1measuring pitch and articulation: 1integrating speech science and technology for clinical applications: 1spoken term detection: 1speech and singing production: 1speech synthesis: 1phonation and voice quality: 1speech and audio segmentation and classification: 1speaker and language recognition applications: 1speech analysis and representation: 1prosody, phonation and voice quality: 1speech and language processing for clinical health applications: 1speech coding and audio processing for noise reduction: 1
IEEE Keywordtime frequency analysis: 3speech recognition: 3cepstral analysis: 3hidden markov models: 3support vector machines: 2single pole filter: 2glottal closure instants: 2filtering theory: 2speech music classification: 1spectrogram: 1spectral peak tracking: 1time frequency audio features: 1probability: 1gaussian processes: 1gmm: 1audio signal processing: 1music: 1signal classification: 1svm: 1cnn: 1and vowel onset point: 1cleft lip and palate: 1consonant vowel transitions: 1discrete cosine transforms: 1fourier transforms: 1misarticulated stops: 1cleft palate: 1and velopharyngeal dysfunction: 1nasalized voiced stops: 1epochs: 1neural net architecture: 1electroglottograph: 1generative adversarial network: 1glottal opening instants: 1time frequency representation: 1telephone quality speech: 1telephone sets: 1epoch extraction: 1time marginal: 1voice activity detection: 1sonority: 1phoneme recognition: 1hilbert transforms: 1zero time windowing: 1system: 1suprasegmental: 1source: 1mlsa: 1speech enhancement: 1foreground segmentation: 1zero band filter: 1approximation theory: 1mcc: 1soe: 1hts: 1residual mceps: 1frequency domain analysis: 1speech synthesis: 1source modeling: 1integrated lp residual: 1
Most Publications2018: 292017: 242019: 232022: 222021: 18

Affiliations
URLs

SpeechComm2022 Mrinmoy Bhattacharjee, S. R. Mahadeva Prasanna, Prithwijit Guha, 
Speech/music classification using phase-based and magnitude-based features.

Interspeech2022 Moakala Tzudir, Priyankoo Sarmah, S. R. Mahadeva Prasanna
Prosodic Information in Dialect Identification of a Tonal Language: The case of Ao.

Interspeech2021 Shikha Baghel, Mrinmoy Bhattacharjee, S. R. Mahadeva Prasanna, Prithwijit Guha, 
Automatic Detection of Shouted Speech Segments in Indian News Debates.

Interspeech2021 Moakala Tzudir, Shikha Baghel, Priyankoo Sarmah, S. R. Mahadeva Prasanna
Excitation Source Feature Based Dialect Identification in Ao - A Low Resource Language.

SpeechComm2020 Protima Nomo Sudro, S. R. Mahadeva Prasanna
Enhancement of cleft palate speech using temporal and spectral processing.

SpeechComm2020 Akhilesh Kumar Dubey, S. R. Mahadeva Prasanna, Samarendra Dandapat, 
Sinusoidal model-based hypernasality detection in cleft palate speech using CVCV sequence.

TASLP2020 Mrinmoy Bhattacharjee, S. R. Mahadeva Prasanna, Prithwijit Guha, 
Speech/Music Classification Using Features From Spectral Peaks.

TASLP2020 Vikram C. Mathad, S. R. Mahadeva Prasanna
Vowel Onset Point Based Screening of Misarticulated Stops in Cleft Lip and Palate Speech.

Interspeech2020 Ajish K. Abraham, M. Pushpavathi, N. Sreedevi, A. Navya, Vikram C. Mathad, S. R. Mahadeva Prasanna
Spectral Moment and Duration of Burst of Plosives in Speech of Children with Hearing Impairment and Typically Developing Children - A Comparative Study.

Interspeech2020 Ayush Agarwal, Jagabandhu Mishra, S. R. Mahadeva Prasanna
VOP Detection in Variable Speech Rate Condition.

TASLP2019 Vikram C. M., Nagaraj Adiga, S. R. Mahadeva Prasanna
Detection of Nasalized Voiced Stops in Cleft Palate Speech Using Epoch-Synchronous Features.

ICASSP2019 K. T. Deepak, Pavitra Kulkarni, U. Mudenagudi, S. R. M. Prasanna
Glottal Instants Extraction from Speech Signal Using Generative Adversarial Network.

Interspeech2019 Akhilesh Kumar Dubey, S. R. Mahadeva Prasanna, Samarendra Dandapat, 
Hypernasality Severity Detection Using Constant Q Cepstral Coefficients.

Interspeech2019 Sarfaraz Jelil, Abhishek Shrivastava, Rohan Kumar Das, S. R. Mahadeva Prasanna, Rohit Sinha 0003, 
SpeechMarker: A Voice Based Multi-Level Attendance Application.

Interspeech2019 Sishir Kalita, Protima Nomo Sudro, S. R. Mahadeva Prasanna, Samarendra Dandapat, 
Nasal Air Emission in Sibilant Fricatives of Cleft Lip and Palate Speech.

Interspeech2019 Protima Nomo Sudro, S. R. Mahadeva Prasanna
Modification of Devoicing Error in Cleft Lip and Palate Speech.

SpeechComm2018 Rajib Sharma, Ramesh K. Bhukya, S. R. M. Prasanna
Analysis of the Hilbert Spectrum for Text-Dependent Speaker Verification.

SpeechComm2018 Bidisha Sharma, S. R. Mahadeva Prasanna
Significance of sonority information for voiced/unvoiced decision in speech synthesis.

Interspeech2018 Kishalay Chakraborty, Senjam Shantirani Devi, Sanjeevan Devnath, S. R. Mahadeva Prasanna, Priyankoo Sarmah, 
Glotto Vibrato Graph: A Device and Method for Recording, Analysis and Visualization of Glottal Activity.

Interspeech2018 Abhishek Dey, Abhash Deka, Siddika Imani, Barsha Deka, Rohit Sinha 0003, S. R. Mahadeva Prasanna, Priyankoo Sarmah, K. Samudravijaya, S. R. Nirmala, 
AGROASSAM: A Web Based Assamese Speech Recognition Application for Retrieving Agricultural Commodity Price and Weather Information.

#48  | Yonghong Yan 0002 | Google Scholar   DBLP
VenuesInterspeech: 31ICASSP: 7TASLP: 6SpeechComm: 4
Years2023: 12022: 122021: 62020: 22019: 102018: 62017: 62016: 5
ISCA Sectionnovel models and training methods for asr: 3speech synthesis: 2noise robust and far-field asr: 2speaker embedding and diarization: 1asr: 1spoken language processing: 1multi-, cross-lingual and other topics in asr: 1atypical speech analysis and detection: 1low-resource asr development: 1source separation, dereverberation and echo cancellation: 1speech recognition and beyond: 1lexicon and language model for speech recognition: 1asr neural network training: 1speaker and language recognition: 1asr for noisy and far-field speech: 1model adaptation for asr: 1acoustic scenes and rare events: 1novel neural network architectures for acoustic modelling: 1neural network training strategies for asr: 1spoken dialogue systems and conversational analysis: 1source separation and spatial analysis: 1language modeling: 1acoustic models for asr: 1source separation and voice activity detection: 1source separation and auditory scene analysis: 1music, audio, and source separation: 1speech enhancement and noise reduction: 1
IEEE Keywordspeech recognition: 7end to end: 3end to end speech recognition: 2natural language processing: 2text analysis: 2deep neural network: 2direction of arrival estimation: 2recurrent neural nets: 2speech enhancement: 2hidden markov models: 1hybrid dnn hmm speech recognition: 1mixture models: 1automatic speech recognition: 1gaussian processes: 1probability: 1entropy: 1long tailed problem: 1pattern classification: 1supervised learning: 1self supervised pre training: 1signal classification: 1random processes: 1mean square error methods: 1source separation: 1time frequency mask: 1weighted histogram analysis: 1steering vector phase difference: 1sound source localization: 1speech coding: 1pre training: 1grammars: 1transformer: 1unpaired data: 1data compression: 1pruning: 1matrix algebra: 1model compression: 1multilayer perceptrons: 1matrix product operators: 1interpretability: 1convolutional neural nets: 1autoregressive moving average processes: 1autoregressive moving average: 1neural language models: 1complex deep neural network: 1hearing: 1interference suppression: 1complex ideal ratio mask: 1binaural speech enhancement: 1binaural cue preservation: 1different timescales: 1t vector: 1speaker embedding: 1biological research: 1speaker recognition: 1elevation perception: 1elevation control: 1binaural synthesis: 1spectral cues: 1head related transfer function: 1wake up word speech recognition: 1far field speech recognition: 1signal detection: 1direction of arrival: 1regression analysis: 1microphones: 1spatial resolution: 1speech source localization: 1time delay histogram: 1spatial aliasing: 1query processing: 1example generation: 1example quality assessment: 1spoken term detection: 1multiple examples utilization: 1query by example: 1
Most Publications2008: 292022: 262021: 252012: 252009: 24

Affiliations
Chinese Academy of Sciences, Institute of Acoustics / Xinjiang Technical Institute of Physics and Chemistry, China

SpeechComm2023 Feng Dang, Hangting Chen, Qi Hu, Pengyuan Zhang, Yonghong Yan 0002
First coarse, fine afterward: A lightweight two-stage complex approach for monaural speech enhancement.

TASLP2022 Gaofeng Cheng, Haoran Miao, Runyan Yang, Keqi Deng, Yonghong Yan 0002
ETEH: Unified Attention-Based End-to-End ASR and KWS Architecture.

TASLP2022 Keqi Deng, Gaofeng Cheng, Runyan Yang, Yonghong Yan 0002
Alleviating ASR Long-Tailed Problem by Decoupling the Learning of Representation and Classification.

TASLP2022 Changfeng Gao, Gaofeng Cheng, Ta Li, Pengyuan Zhang, Yonghong Yan 0002
Self-Supervised Pre-Training for Attention-Based Encoder-Decoder ASR Model.

Interspeech2022 Yifan Chen, Yifan Guo, Qingxuan Li, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan 0002
Interrelate Training and Searching: A Unified Online Clustering Framework for Speaker Diarization.

Interspeech2022 Zehan Li, Haoran Miao, Keqi Deng, Gaofeng Cheng, Sanli Tian, Ta Li, Yonghong Yan 0002
Improving Streaming End-to-End ASR on Transformer-based Causal Models with Encoder States Revision Strategies.

Interspeech2022 Yukun Liu, Ta Li, Pengyuan Zhang, Yonghong Yan 0002
NAS-SCAE: Searching Compact Attention-based Encoders For End-to-end Automatic Speech Recognition.

Interspeech2022 Sanli Tian, Keqi Deng, Zehan Li, Lingxuan Ye, Gaofeng Cheng, Ta Li, Yonghong Yan 0002
Knowledge Distillation For CTC-based Speech Recognition Via Consistent Acoustic Representation Learning.

Interspeech2022 Zehui Yang, Yifan Chen, Lei Luo, Runyan Yang, Lingxuan Ye, Gaofeng Cheng, Ji Xu, Yaohui Jin, Qingqing Zhang, Pengyuan Zhang, Lei Xie, Yonghong Yan 0002
Open Source MagicData-RAMC: A Rich Annotated Mandarin Conversational(RAMC) Speech Dataset.

Interspeech2022 Lingxuan Ye, Gaofeng Cheng, Runyan Yang, Zehui Yang, Sanli Tian, Pengyuan Zhang, Yonghong Yan 0002
Improving Recognition of Out-of-vocabulary Words in E2E Code-switching ASR by Fusing Speech Generation Methods.

Interspeech2022 Xueshuai Zhang, Jiakun Shen, Jun Zhou, Pengyuan Zhang, Yonghong Yan 0002, Zhihua Huang, Yanfen Tang, Yu Wang, Fujie Zhang, Shaoxing Zhang, Aijun Sun, 
Robust Cough Feature Extraction and Classification Method for COVID-19 Cough Detection Based on Vocalization Characteristics.

Interspeech2022 Han Zhu, Li Wang, Gaofeng Cheng, Jindong Wang 0001, Pengyuan Zhang, Yonghong Yan 0002
Wav2vec-S: Semi-Supervised Pre-Training for Low-Resource ASR.

Interspeech2022 Han Zhu, Jindong Wang 0001, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan 0002
Decoupled Federated Learning for ASR with Non-IID Data.

SpeechComm2021 Danyang Liu, Ji Xu, Pengyuan Zhang, Yonghong Yan 0002
A unified system for multilingual speech recognition and language identification.

TASLP2021 Longbiao Cheng, Xingwei Sun, Dingding Yao, Junfeng Li, Yonghong Yan 0002
Estimation Reliability Function Assisted Sound Source Localization With Enhanced Steering Vector Phase Difference.

ICASSP2021 Changfeng Gao, Gaofeng Cheng, Runyan Yang, Han Zhu, Pengyuan Zhang, Yonghong Yan 0002
Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Text Data.

Interspeech2021 Jianjun Gu 0005, Longbiao Cheng, Xingwei Sun, Junfeng Li, Yonghong Yan 0002
Residual Echo and Noise Cancellation with Feature Attention Module and Multi-Domain Loss Function.

Interspeech2021 Zengqiang Shang, Zhihua Huang, Haozhe Zhang, Pengyuan Zhang, Yonghong Yan 0002
Incorporating Cross-Speaker Style Transfer for Multi-Language Text-to-Speech.

Interspeech2021 Haozhe Zhang, Zhihua Huang, Zengqiang Shang, Pengyuan Zhang, Yonghong Yan 0002
LinearSpeech: Parallel Text-to-Speech with Linear Complexity.

SpeechComm2020 Fan Yang, Ziteng Wang, Junfeng Li, Risheng Xia, Yonghong Yan 0002
Improving generative adversarial networks for speech enhancement through regularization of latent representations.

#49  | Tatsuya Kawahara | Google Scholar   DBLP
VenuesInterspeech: 30ICASSP: 13TASLP: 4NAACL: 1
Years2022: 82021: 42020: 92019: 92018: 102017: 52016: 3
ISCA Sectionturn management in dialogue: 2dialog modeling: 2asr: 1multi-, cross-lingual and other topics in asr: 1speaking styles and interaction styles: 1asr technologies and systems: 1dereverberation, noise reduction, and speaker extraction: 1streaming for asr/rnn transducers: 1search/decoding techniques and confidence measures for asr: 1spoken dialogue system: 1speech emotion recognition: 1neural networks for language modeling: 1asr neural network architectures and training: 1streaming asr: 1topics in asr: 1conversational systems: 1cross-lingual and multilingual asr: 1spoken term detection, confidence measure, and end-to-end speech recognition: 1nn architectures for asr: 1training strategy for speech emotion recognition: 1multimodal dialogue systems: 1spoken dialogue systems and conversational analysis: 1acoustic modelling: 1recurrent neural models for asr: 1adjusting to speaker, accent, and domain: 1noise robust speech recognition: 1dialogue: 1far-field, robustness and adaptation: 1
IEEE Keywordspeech recognition: 11matrix decomposition: 4natural language processing: 4covariance matrices: 3blind source separation: 3speaker recognition: 3unsupervised learning: 3pattern classification: 3joint diagonalization: 2maximum likelihood estimation: 2audio signal processing: 2speech synthesis: 2speech coding: 2decoding: 2linguistics: 2speech enhancement: 2variational autoencoder: 2pronunciation error detection: 2articulation modeling: 2computer assisted pronunciation training (capt): 2dnn: 2acoustic model: 2iterative methods: 1covariance analysis: 1dereverberation: 1autoregressive moving average processes: 1source separation: 1reverberation: 1optimisation: 1multichannel audio signal processing: 1transformer: 1domain adaptation: 1fastspeech 2: 1emotion recognition: 1multiple corpora: 1multi task learning: 1self attention mechanism: 1speech emotion recognition: 1language translation: 1conditional masked language model: 1encoding: 1multiprocessing systems: 1autoregressive processes: 1non autoregressive decoding: 1end to end speech translation: 1non native acoustic modeling: 1capt: 1cross lingual transfer: 1pronunciation error detection and diagnosis: 1call: 1time frequency analysis: 1multichannel nonnegative matrix factorization: 1blind source separation (bss): 1image representation: 1full rank spatial covariance matrix: 1gaussian distribution: 1supervised learning: 1multichannel speech enhancement: 1matrix algebra: 1nonnegative matrix factorization: 1signal denoising: 1vocabulary: 1low resource language: 1multilingual speech recognition: 1end to end asr: 1transfer learning: 1text analysis: 1acoustic to word model: 1sequence to sequence speech recognition: 1sequence to sequence speech synthesis: 1multi speaker speech synthesis: 1training data augmentation: 1single channel speech enhancement: 1bayesian signal processing: 1multi label dnn: 1attribute label correction: 1social signals: 1end to end training: 1automatic speech recognition: 1hidden markov models: 1social networking (online): 1connectionist temporal classification: 1audio visual systems: 1behavior analysis: 1conversation analysis: 1humanoid robots: 1engagement: 1audio visual signal processing: 1human robot interaction: 1acoustic to word: 1connectionist temporal classification (ctc): 1signal classification: 1multitask learning: 1word processing: 1end to end speech recogntion: 1attention: 1computer aided instruction: 1multi lingual learning: 1semi supervised training: 1lecture transcription: 1unsupervised training: 1
Most Publications2020: 322018: 282019: 242021: 232017: 21


TASLP2022 Kouhei Sekiguchi, Yoshiaki Bando, Aditya Arie Nugraha, Mathieu Fontaine 0002, Kazuyoshi Yoshii, Tatsuya Kawahara
Autoregressive Moving Average Jointly-Diagonalizable Spatial Covariance Analysis for Joint Source Separation and Dereverberation.

ICASSP2022 Sei Ueno, Tatsuya Kawahara
Phone-Informed Refinement of Synthesized Mel Spectrogram for Data Augmentation in Speech Recognition.

ICASSP2022 Heran Zhang, Masato Mimura, Tatsuya Kawahara, Kenkichi Ishizuka, 
Selective Multi-Task Learning For Speech Emotion Recognition Using Corpora Of Different Styles.

Interspeech2022 Hayato Futami, Hirofumi Inaguma, Sei Ueno, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara
Non-autoregressive Error Correction for CTC-based ASR with Phone-conditioned Masked LM.

Interspeech2022 Soky Kak, Sheng Li 0010, Masato Mimura, Chenhui Chu, Tatsuya Kawahara
Leveraging Simultaneous Translation for Enhancing Transcription of Low-resource Language via Cross Attention Mechanism.

Interspeech2022 Seiya Kawano, Muteki Arioka, Akishige Yuguchi, Kenta Yamamoto, Koji Inoue, Tatsuya Kawahara, Satoshi Nakamura 0001, Koichiro Yoshino, 
Multimodal Persuasive Dialogue Corpus using Teleoperated Android.

Interspeech2022 Jumon Nozaki, Tatsuya Kawahara, Kenkichi Ishizuka, Taiichi Hashimoto, 
End-to-end Speech-to-Punctuated-Text Recognition.

Interspeech2022 Hao Shi, Longbiao Wang, Sheng Li 0010, Jianwu Dang 0001, Tatsuya Kawahara
Monaural Speech Enhancement Based on Spectrogram Decomposition for Convolutional Neural Network-sensitive Feature Extraction.

ICASSP2021 Hirofumi Inaguma, Yosuke Higuchi, Kevin Duh, Tatsuya Kawahara, Shinji Watanabe 0001, 
ORTHROS: non-autoregressive end-to-end speech translation With dual-decoder.

Interspeech2021 Hirofumi Inaguma, Tatsuya Kawahara
StableEmit: Selection Probability Discount for Reducing Emission Latency of Streaming Monotonic Attention ASR.

Interspeech2021 Hirofumi Inaguma, Tatsuya Kawahara
VAD-Free Streaming Hybrid CTC/Attention ASR for Unsegmented Recording.

NAACL2021 Hirofumi Inaguma, Tatsuya Kawahara, Shinji Watanabe 0001, 
Source and Target Bidirectional Knowledge Distillation for End-to-end Speech Translation.

TASLP2020 Richeng Duan, Tatsuya Kawahara, Masatake Dantsuji, Hiroaki Nanjo, 
Cross-Lingual Transfer Learning of Non-Native Acoustic Modeling for Pronunciation Error Detection and Diagnosis.

TASLP2020 Kouhei Sekiguchi, Yoshiaki Bando, Aditya Arie Nugraha, Kazuyoshi Yoshii, Tatsuya Kawahara
Fast Multichannel Nonnegative Matrix Factorization With Directivity-Aware Jointly-Diagonalizable Spatial Covariance Matrices for Blind Source Separation.

Interspeech2020 Viet-Trung Dang, Tianyu Zhao, Sei Ueno, Hirofumi Inaguma, Tatsuya Kawahara
End-to-End Speech-to-Dialog-Act Recognition.

Interspeech2020 Han Feng, Sei Ueno, Tatsuya Kawahara
End-to-End Speech Emotion Recognition Combined with Acoustic-to-Word ASR Model.

Interspeech2020 Hayato Futami, Hirofumi Inaguma, Sei Ueno, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara
Distilling the Knowledge of BERT for Sequence-to-Sequence ASR.

Interspeech2020 Hirofumi Inaguma, Masato Mimura, Tatsuya Kawahara
CTC-Synchronous Training for Monotonic Attention Model.

Interspeech2020 Hirofumi Inaguma, Masato Mimura, Tatsuya Kawahara
Enhancing Monotonic Multihead Attention for Streaming ASR.

Interspeech2020 Kohei Matsuura, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara
Generative Adversarial Training Data Adaptation for Very Low-Resource Automatic Speech Recognition.

#50  | Bhuvana Ramabhadran | Google Scholar   DBLP
VenuesInterspeech: 29ICASSP: 19
Years2022: 82021: 72020: 52019: 32018: 52017: 142016: 6
ISCA Sectionnovel models and training methods for asr: 3neural network training methods for asr: 2speech synthesis: 2neural network acoustic models for asr: 2neural networks for language modeling: 2resource-constrained asr: 1asr: 1self-supervised, semi-supervised, adaptation and data augmentation for asr: 1streaming for asr/rnn transducers: 1speech recognition of atypical speech: 1self-supervision and semi-supervision for neural asr training: 1asr neural network architectures and training: 1training strategies for asr: 1multilingual and code-switched asr: 1cross-lingual and multilingual asr: 1adjusting to speaker, accent, and domain: 1perspective talk: 1prosody and text processing: 1conversational telephone speech recognition: 1spoken term detection: 1new trends in neural networks for speech recognition: 1acoustic model adaptation: 1low resource speech recognition: 1
IEEE Keywordspeech recognition: 16natural language processing: 9speech synthesis: 4multilingual: 4text analysis: 3rnn t: 3data augmentation: 3automatic speech recognition: 3keyword search: 3speaker recognition: 2end to end speech recognition: 2recurrent neural nets: 2feedforward neural nets: 2acoustic model: 2word embeddings: 2consistency regularization: 1self supervised: 1n best rescoring: 1speech normalization: 1speech impairments: 1sequence to sequence model: 1voice conversion: 1language id: 1mixture of experts: 1gradient methods: 1transliteration: 1language independent: 1error statistics: 1speech coding: 1entropy: 1language model adaptation: 1code switched automatic speech recognition: 1hidden markov models: 1direct acoustics to word models: 1decoding: 1end to end models: 1computational linguistics: 1prosody prediction: 1low resources: 1multi task learning: 1acoustic modeling: 1multi accent speech recognition: 1─ end to end models: 1query processing: 1audio coding: 1end to end systems: 1neural network training: 1sampling methods: 1performance evaluation: 1cnn: 1transforms: 1joint training: 1neural network: 1denoising autoencoder: 1channel bank filters: 1signal denoising: 1harmonic structure: 1time frequency analysis: 1feature fusion: 1attention networks: 1ctc: 1lstm: 1vgg: 1representation learning: 1programming language semantics: 1language modeling: 1error analysis: 1regression analysis: 1one vs one multi class classification: 1random fourier features: 1large scale kernel machines: 1deep neural networks: 1prosodic phrasing: 1prominence prediction: 1unsupervised learning: 1
Most Publications2017: 262011: 242022: 232013: 172014: 16

Affiliations
URLs

ICASSP2022 Zhehuai Chen, Yu Zhang 0033, Andrew Rosenberg, Bhuvana Ramabhadran, Pedro J. Moreno 0001, Gary Wang, 
Tts4pretrain 2.0: Advancing the use of Text and Speech in ASR Pretraining with Consistency and Contrastive Losses.

ICASSP2022 Neeraj Gaur, Tongzhou Chen, Ehsan Variani, Parisa Haghani, Bhuvana Ramabhadran, Pedro J. Moreno 0001, 
Multilingual Second-Pass Rescoring for Automatic Speech Recognition Systems.

Interspeech2022 Kartik Audhkhasi, Yinghui Huang, Bhuvana Ramabhadran, Pedro J. Moreno 0001, 
Analysis of Self-Attention Head Diversity for Conformer-based Automatic Speech Recognition.

Interspeech2022 Murali Karthick Baskar, Andrew Rosenberg, Bhuvana Ramabhadran, Yu Zhang 0033, Nicolás Serrano, 
Reducing Domain mismatch in Self-supervised speech pre-training.

Interspeech2022 Zhehuai Chen, Yu Zhang 0033, Andrew Rosenberg, Bhuvana Ramabhadran, Pedro J. Moreno 0001, Ankur Bapna, Heiga Zen, 
MAESTRO: Matched Speech Text Representations through Modality Matching.

Interspeech2022 Ehsan Variani, Michael Riley 0001, David Rybach, Cyril Allauzen, Tongzhou Chen, Bhuvana Ramabhadran
On Adaptive Weight Interpolation of the Hybrid Autoregressive Transducer.

Interspeech2022 Weiran Wang, Tongzhou Chen, Tara N. Sainath, Ehsan Variani, Rohit Prabhavalkar, W. Ronny Huang, Bhuvana Ramabhadran, Neeraj Gaur, Sepand Mavandadi, Cal Peyser, Trevor Strohman, Yanzhang He, David Rybach, 
Improving Rare Word Recognition with LM-aware MWER Training.

Interspeech2022 Gary Wang, Andrew Rosenberg, Bhuvana Ramabhadran, Fadi Biadsy, Jesse Emond, Yinghui Huang, Pedro J. Moreno 0001, 
Non-Parallel Voice Conversion for ASR Augmentation.

ICASSP2021 Rohan Doshi, Youzheng Chen, Liyang Jiang, Xia Zhang, Fadi Biadsy, Bhuvana Ramabhadran, Fang Chu, Andrew Rosenberg, Pedro J. Moreno 0001, 
Extending Parrotron: An End-to-End, Speech Conversion and Speech Recognition Model for Atypical Speech.

ICASSP2021 Neeraj Gaur, Brian Farris, Parisa Haghani, Isabel Leal, Pedro J. Moreno 0001, Manasa Prasad, Bhuvana Ramabhadran, Yun Zhu, 
Mixture of Informed Experts for Multilingual Speech Recognition.

Interspeech2021 Kartik Audhkhasi, Tongzhou Chen, Bhuvana Ramabhadran, Pedro J. Moreno 0001, 
Mixture Model Attention: Flexible Streaming and Non-Streaming Automatic Speech Recognition.

Interspeech2021 Zhehuai Chen, Bhuvana Ramabhadran, Fadi Biadsy, Xia Zhang, Youzheng Chen, Liyang Jiang, Fang Chu, Rohan Doshi, Pedro J. Moreno 0001, 
Conformer Parrotron: A Faster and Stronger End-to-End Speech Conversion and Recognition Model for Atypical Speech.

Interspeech2021 Zhehuai Chen, Andrew Rosenberg, Yu Zhang 0033, Heiga Zen, Mohammadreza Ghodsi, Yinghui Huang, Jesse Emond, Gary Wang, Bhuvana Ramabhadran, Pedro J. Moreno 0001, 
Semi-Supervision in ASR: Sequential MixMatch and Factorized TTS-Based Augmentation.

Interspeech2021 Isabel Leal, Neeraj Gaur, Parisa Haghani, Brian Farris, Pedro J. Moreno 0001, Manasa Prasad, Bhuvana Ramabhadran, Yun Zhu, 
Self-Adaptive Distillation for Multilingual Speech Recognition: Leveraging Student Independence.

Interspeech2021 Hainan Xu, Kartik Audhkhasi, Yinghui Huang, Jesse Emond, Bhuvana Ramabhadran
Regularizing Word Segmentation by Creating Misspellings.

ICASSP2020 Arindrima Datta, Bhuvana Ramabhadran, Jesse Emond, Anjuli Kannan, Brian Roark, 
Language-Agnostic Multilingual Modeling.

ICASSP2020 Gary Wang, Andrew Rosenberg, Zhehuai Chen, Yu Zhang 0033, Bhuvana Ramabhadran, Yonghui Wu, Pedro J. Moreno 0001, 
Improving Speech Recognition Using Consistent Predictions on Synthesized Speech.

Interspeech2020 Zhehuai Chen, Andrew Rosenberg, Yu Zhang 0033, Gary Wang, Bhuvana Ramabhadran, Pedro J. Moreno 0001, 
Improving Speech Recognition Using GAN-Based Speech Synthesis and Contrastive Unspoken Text Selection.

Interspeech2020 Gary Wang, Andrew Rosenberg, Zhehuai Chen, Yu Zhang 0033, Bhuvana Ramabhadran, Pedro J. Moreno 0001, 
SCADA: Stochastic, Consistent and Adversarial Data Augmentation to Improve ASR.

Interspeech2020 Yun Zhu, Parisa Haghani, Anshuman Tripathi, Bhuvana Ramabhadran, Brian Farris, Hainan Xu, Han Lu, Hasim Sak, Isabel Leal, Neeraj Gaur, Pedro J. Moreno 0001, Qian Zhang, 
Multilingual Speech Recognition with Self-Attention Structured Parameterization.

#51  | Mark J. F. Gales | Google Scholar   DBLP
VenuesInterspeech: 27ICASSP: 11TASLP: 7SpeechComm: 1ACL: 1
Years2022: 22021: 32020: 82019: 62018: 92017: 82016: 11
ISCA Sectionspoken language evaluatiosn: 3speech synthesis: 2language learning and databases: 2applications in education and learning: 2new products and services: 2applications in transcription, education and learning: 1automatic speech recognition for non-native children’s speech: 1pronunciation: 1summarization, semantic analysis and classification: 1language modeling: 1recurrent neural models for asr: 1statistical parametric speech synthesis: 1acoustic model adaptation: 1neural networks for language modeling: 1speech recognition: 1conversational telephone speech recognition: 1acoustic models for asr: 1robustness in speech processing: 1topics in speech recognition: 1new trends in neural networks for speech recognition: 1decoding, system combination: 1
IEEE Keywordspeech recognition: 16natural language processing: 8recurrent neural nets: 6recurrent neural network: 5keyword search: 4language model: 3information retrieval: 2confidence: 2spoken language assessment: 2lattice: 2neural network: 2probability: 2television broadcasting: 2audio signal processing: 2joint decoding: 2gpu: 2graph structures: 1attention: 1language translation: 1end to end training: 1spoken language processing: 1speech translation: 1embedding passing: 1regression analysis: 1bias in deep learning: 1concept activation vectors: 1deep learning (artificial intelligence): 1sub word: 1quality control: 1succeeding words: 1feedforward: 1lattice free: 1teacher student: 1ensemble: 1automatic speech recognition: 1random forest: 1grammatical error detection: 1computer aided instruction: 1call: 1linguistics: 1bi directional recurrent neural network: 1confidence estimation: 1confusion network: 1acoustic model: 1text to speech: 1pulse model: 1frequency domain analysis: 1speech synthesis: 1voice: 1vocoders: 1parametric speech synthesis: 1interpretability: 1activation regularisation: 1visualisation: 1i vectors: 1speaker adaptation: 1deep neural networks: 1multi basis adaptive neural networks: 1computational linguistics: 1query processing: 1statistical distributions: 1morph to word transduction: 1single index: 1stimulated training: 1limited resources: 1variance regularisation: 1language models: 1graphics processing units: 1pipelined training: 1noise contrastive: 1estimation: 1source code (software): 1open source toolkit: 1audio segmentation: 1deep neural network: 1pattern clustering: 1multi genre broadcast data: 1error analysis: 1speech coding: 1log linear model: 1hybrid system: 1tandem system: 1structured svm: 1
Most Publications2015: 262011: 252013: 242018: 212014: 21


TASLP2022 Anton Ragni, Mark J. F. Gales, Oliver Rose, Katherine Knill, Alexandros Kastanos, Qiujia Li, Preben Ness, 
Increasing Context for Estimating Confidence Scores in Automatic Speech Recognition.

Interspeech2022 Stefano Bannò, Bhanu Balusu, Mark J. F. Gales, Kate Knill, Konstantinos Kyriakopoulos, 
View-Specific Assessment of L2 Spoken English.

ICASSP2021 Yiting Lu, Yu Wang 0027, Mark J. F. Gales
Efficient Use of End-to-End Data in Spoken Language Processing.

ICASSP2021 Xizi Wei, Mark J. F. Gales, Kate M. Knill, 
Analysing Bias in Spoken Language Assessment Using Concept Activation Vectors.

Interspeech2021 Qingyun Dou, Xixin Wu, Moquan Wan, Yiting Lu, Mark J. F. Gales
Deliberation-Based Multi-Pass Speech Synthesis.

ICASSP2020 Alexandros Kastanos, Anton Ragni, Mark J. F. Gales
Confidence Estimation for Black Box Automatic Speech Recognition Systems Using Lattice Recurrent Neural Networks.

Interspeech2020 Qingyun Dou, Joshua Efiong, Mark J. F. Gales
Attention Forcing for Speech Synthesis.

Interspeech2020 Kate M. Knill, Linlin Wang, Yu Wang 0027, Xixin Wu, Mark J. F. Gales
Non-Native Children's Automatic Speech Recognition: The INTERSPEECH 2020 Shared Task ALTA Systems.

Interspeech2020 Konstantinos Kyriakopoulos, Kate M. Knill, Mark J. F. Gales
Automatic Detection of Accent and Lexical Pronunciation Errors in Spontaneous Non-Native English Speech.

Interspeech2020 Yiting Lu, Mark J. F. Gales, Yu Wang 0027, 
Spoken Language 'Grammatical Error Correction'.

Interspeech2020 Potsawee Manakul, Mark J. F. Gales, Linlin Wang, 
Abstractive Spoken Document Summarization Using Hierarchical Model with Multi-Stage Attention Diversity Optimization.

Interspeech2020 Vyas Raina, Mark J. F. Gales, Kate M. Knill, 
Universal Adversarial Attacks on Spoken Language Assessment Systems.

Interspeech2020 Xixin Wu, Kate M. Knill, Mark J. F. Gales, Andrey Malinin, 
Ensemble Approaches for Uncertainty in Spoken Language Assessment.

TASLP2019 Xie Chen 0001, Xunying Liu, Yu Wang 0027, Anton Ragni, Jeremy Heng Meng Wong, Mark J. F. Gales
Exploiting Future Word Contexts in Neural Network Language Models for Speech Recognition.

TASLP2019 Jeremy Heng Meng Wong, Mark John Francis Gales, Yu Wang 0027, 
General Sequence Teacher-Student Learning.

ICASSP2019 Kate M. Knill, Mark J. F. Gales, P. P. Manakul, Andrew Caines, 
Automatic Grammatical Error Detection of Non-native Spoken Learner English.

ICASSP2019 Qiujia Li, Preben Ness, Anton Ragni, Mark J. F. Gales
Bi-directional Lattice Recurrent Neural Networks for Confidence Estimation.

Interspeech2019 Konstantinos Kyriakopoulos, Kate M. Knill, Mark J. F. Gales
A Deep Learning Approach to Automatic Characterisation of Rhythm in Non-Native English Speech.

Interspeech2019 Yiting Lu, Mark J. F. Gales, Kate M. Knill, P. P. Manakul, Linlin Wang, Yu Wang 0027, 
Impact of ASR Performance on Spoken Grammatical Error Detection.

SpeechComm2018 Yu Wang 0027, Mark J. F. Gales, Kate M. Knill, Konstantinos Kyriakopoulos, Andrey Malinin, Rogier C. van Dalen, M. Rashid, 
Towards automatic assessment of spontaneous spoken English.

#52  | Hsin-Min Wang | Google Scholar   DBLP
VenuesInterspeech: 27ICASSP: 13TASLP: 6SpeechComm: 1
Years2022: 92021: 52020: 82019: 62018: 42017: 92016: 6
ISCA Sectionsingle-channel speech enhancement: 2source separation: 2speech synthesis: 2speech enhancement: 2voice conversion: 2special session: 2the voicemos challenge: 1neural transducers, streaming asr and novel asr models: 1speech intelligibility prediction for hearing-impaired listeners: 1speech enhancement and intelligibility: 1spoken machine translation: 1voice conversion and adaptation: 1noise reduction and intelligibility: 1model training for asr: 1neural techniques for voice conversion and waveform generation: 1speech intelligibility and quality: 1spoken document processing: 1speech analysis and representation: 1speech-enhancement: 1discriminative training for asr: 1spoken documents, spoken understanding and semantic analysis: 1
IEEE Keywordspeech enhancement: 7speech recognition: 6natural language processing: 4convolutional neural nets: 3deep neural network: 3text analysis: 3speaker recognition: 3audio signal processing: 2gaussian processes: 2decoding: 2signal denoising: 2speaker verification: 2summarization: 2information retrieval: 2spoken document: 2optimisation: 2audio visual systems: 1recurrent neural nets: 1asynchronous multimodal learning: 1data compression: 1audio visual: 1low quality data: 1data privacy: 1medical signal processing: 1non invasive: 1deep learning (artificial intelligence): 1sensor fusion: 1electromyography: 1multimodal: 1anti spoofing: 1partially fake audio detection: 1biometrics (access control): 1speech synthesis: 1security of data: 1audio deep synthesis detection challenge: 1language model: 1bert: 1subspace based representation: 1phonotactic language recognition: 1support vector machines: 1matrix decomposition: 1subspace based learning: 1multichannel speech enhancement: 1raw waveform mapping: 1microphones: 1inner ear microphones: 1phase estimation: 1distributed microphones: 1fully convolutional network (fcn): 1deep neural networks: 1ensemble learning: 1decision trees: 1generalizability: 1dynamically sized decision tree: 1statistics: 1time delay neural network: 1statistics pooling: 1convolutional neural network: 1multilayer perceptrons: 1speaker identification: 1articulatory feature: 1regression analysis: 1deep denoising autoencoder: 1signal classification: 1unsupervised learning: 1character error rate: 1reinforcement learning: 1mean square error methods: 1automatic speech recognition: 1representation learning: 1paragraph embedding: 1unsupervised: 1distilling: 1query languages: 1query modeling: 1essence vector: 1retrieval: 1spoken document retrieval: 1distill: 1locality: 1representation: 1plda: 1autoencoders: 1discriminative training: 1linear programming: 1extractive summarization: 1integer programming: 1broadcasting: 1manifold learning: 1nonlinear dimension reduction: 1local invariance: 1filtering theory: 1postfiltering: 1locally linear embedding: 1spectral analysis: 1singing voice: 1vowel likelihood: 1glottal pulse shape: 1lyrics alignment: 1formant frequency: 1vowel timbre examples: 1acoustic phonetics: 1f0 modification: 1relevance: 1relevance feedback: 1document handling: 1redundancy: 1diversity: 1
Most Publications2022: 302021: 302019: 252017: 252020: 24

Affiliations
Academia Sinica, Taipei, Taiwan
National Taiwan University, Taipei, Taiwan (PhD 1995)

TASLP2022 Shang-Yi Chuang, Hsin-Min Wang, Yu Tsao 0001, 
Improved Lite Audio-Visual Speech Enhancement.

ICASSP2022 Kuan-Chen Wang, Kai-Chun Liu, Hsin-Min Wang, Yu Tsao 0001, 
EMGSE: Acoustic/EMG Fusion for Multimodal Speech Enhancement.

ICASSP2022 Haibin Wu, Heng-Cheng Kuo, Naijun Zheng, Kuo-Hsuan Hung, Hung-Yi Lee, Yu Tsao 0001, Hsin-Min Wang, Helen Meng, 
Partially Fake Audio Detection by Self-Attention-Based Fake Span Discovery.

Interspeech2022 Wen-Chin Huang, Erica Cooper, Yu Tsao 0001, Hsin-Min Wang, Tomoki Toda, Junichi Yamagishi, 
The VoiceMOS Challenge 2022.

Interspeech2022 Hung-Shin Lee, Pin-Tuan Huang, Yao-Fei Cheng, Hsin-Min Wang
Chain-based Discriminative Autoencoders for Speech Recognition.

Interspeech2022 Chi-Chang Lee, Cheng-Hung Hu, Yu-Chen Lin, Chu-Song Chen, Hsin-Min Wang, Yu Tsao 0001, 
NASTAR: Noise Adaptive Speech Enhancement with Target-Conditional Resampling.

Interspeech2022 Fan-Lin Wang, Hung-Shin Lee, Yu Tsao 0001, Hsin-Min Wang
Disentangling the Impacts of Language and Channel Variability on Speech Separation Networks.

Interspeech2022 Ryandhimas Edo Zezario, Fei Chen 0011, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao 0001, 
MBI-Net: A Non-Intrusive Multi-Branched Speech Intelligibility Prediction Model for Hearing Aids.

Interspeech2022 Ryandhimas Edo Zezario, Szu-Wei Fu, Fei Chen 0011, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao 0001, 
MTI-Net: A Multi-Target Speech Intelligibility Prediction Model.

ICASSP2021 Wen-Chin Huang, Chia-Hua Wu, Shang-Bao Luo, Kuan-Yu Chen, Hsin-Min Wang, Tomoki Toda, 
Speech Recognition by Simply Fine-Tuning Bert.

Interspeech2021 Yao-Fei Cheng, Hung-Shin Lee, Hsin-Min Wang
AlloST: Low-Resource Speech Translation Without Source Transcription.

Interspeech2021 Wen-Chin Huang, Kazuhiro Kobayashi, Yu-Huai Peng, Ching-Feng Liu, Yu Tsao 0001, Hsin-Min Wang, Tomoki Toda, 
A Preliminary Study of a Two-Stage Paradigm for Preserving Speaker Identity in Dysarthric Voice Conversion.

Interspeech2021 Fan-Lin Wang, Yu-Huai Peng, Hung-Shin Lee, Hsin-Min Wang
Dual-Path Filter Network: Speaker-Aware Modeling for Speech Separation.

Interspeech2021 Yi-Chiao Wu, Cheng-Hung Hu, Hung-Shin Lee, Yu-Huai Peng, Wen-Chin Huang, Yu Tsao 0001, Hsin-Min Wang, Tomoki Toda, 
Relational Data Selection for Data Augmentation of Speaker-Dependent Multi-Band MelGAN Vocoder.

TASLP2020 Hung-Shin Lee, Yu Tsao 0001, Shyh-Kang Jeng, Hsin-Min Wang
Subspace-Based Representation and Learning for Phonotactic Spoken Language Recognition.

TASLP2020 Chang-Le Liu, Sze-Wei Fu, You-Jin Li, Jen-Wei Huang, Hsin-Min Wang, Yu Tsao 0001, 
Multichannel Speech Enhancement by Raw Waveform-Mapping Using Fully Convolutional Networks.

TASLP2020 Cheng Yu, Ryandhimas E. Zezario, Syu-Siang Wang, Jonathan Sherman, Yi-Yen Hsieh, Xugang Lu, Hsin-Min Wang, Yu Tsao 0001, 
Speech Enhancement Based on Denoising Autoencoder With Multi-Branched Encoders.

ICASSP2020 Qian-Bei Hong, Chung-Hsien Wu, Hsin-Min Wang, Chien-Lin Huang, 
Statistics Pooling Time Delay Neural Network Based on X-Vector for Speaker Verification.

ICASSP2020 Qian-Bei Hong, Chung-Hsien Wu, Hsin-Min Wang, Chien-Lin Huang, 
Combining Deep Embeddings of Acoustic and Articulatory Features for Speaker Identification.

ICASSP2020 Ryandhimas E. Zezario, Tassadaq Hussain, Xugang Lu, Hsin-Min Wang, Yu Tsao 0001, 
Self-Supervised Denoising Autoencoder with Linear Regression Decoder for Speech Enhancement.

#53  | Ryo Masumura | Google Scholar   DBLP
VenuesInterspeech: 32ICASSP: 13TASLP: 1
Years2022: 82021: 112020: 82019: 62018: 52017: 62016: 2
ISCA Sectiondialog modeling: 2other topics in speech recognition: 1multi-, cross-lingual and other topics in asr: 1speech synthesis: 1spoken dialogue systems and multimodality: 1single-channel speech enhancement: 1speech emotion recognition: 1novel models and training methods for asr: 1spoken language processing: 1voice activity detection and keyword spotting: 1neural network training methods for asr: 1streaming for asr/rnn transducers: 1search/decoding techniques and confidence measures for asr: 1applications in transcription, education and learning: 1speech classification: 1training strategies for asr: 1asr neural network architectures and training: 1spoken language understanding: 1speech synthesis paradigms and methods: 1training strategy for speech emotion recognition: 1model training for asr: 1dialogue speech understanding: 1nn architectures for asr: 1spoken term detection, confidence measure, and end-to-end speech recognition: 1speaker characterization and analysis: 1selected topics in neural speech processing: 1asr systems and technologies: 1prosody and text processing: 1language understanding and generation: 1language modeling for conversational speech and confidence measures: 1language recognition: 1
IEEE Keywordspeech recognition: 12recurrent neural nets: 5natural language processing: 5neural network: 3probability: 3knowledge distillation: 3recurrent neural network transducer: 2end to end: 2emotion recognition: 2speech emotion recognition: 2transformer: 2end to end automatic speech recognition: 2attention based decoder: 1perceived emotion: 1listener adaptation: 1language translation: 1sequence to sequence pre training: 1spoken text normalization: 1text analysis: 1pointer generator networks: 1self supervised learning: 1blind source separation: 1audio visual: 1speech separation: 1audio signal processing: 1and cross modal: 1hierarchical encoder decoder: 1large context endo to end automatic speech recognition: 1synchronisation: 1whole network pre training: 1entropy: 1autoregressive processes: 1long short term memory recurrent neural networks: 1customer services: 1customer satisfaction: 1call centres: 1customer satisfaction (cs): 1contact center call: 1hierarchical multi task model: 1sequence level consistency training: 1specaugment: 1semi supervised learning: 1speech codecs: 1connectionist temporal classification: 1attention weight: 1speech coding: 1attention based encoder decoder: 1hierarchical recurrent encoder decoder: 1soft target: 1ambiguous emotional utterance: 1lstm with attention: 1graphs: 1confusion networks: 1robustness to asr errors: 1spoken utterance classification: 1domain adaptation: 1dnn acoustic models: 1dnns: 1phonetic awareness: 1lstm rnns: 1senone dnns: 1spoken language identification: 1
Most Publications2021: 282020: 192022: 182019: 182018: 13

Affiliations
URLs

ICASSP2022 Takafumi Moriya, Takanori Ashihara, Atsushi Ando, Hiroshi Sato, Tomohiro Tanaka, Kohei Matsuura, Ryo Masumura, Marc Delcroix, Takahiro Shinozaki, 
Hybrid RNN-T/Attention-Based Streaming ASR with Triggered Chunkwise Attention and Dual Internal Language Model Integration.

Interspeech2022 Naoki Makishima, Satoshi Suzuki, Atsushi Ando, Ryo Masumura
Speaker consistency loss and step-wise optimization for semi-supervised joint training of TTS and ASR using unpaired text data.

Interspeech2022 Ryo Masumura, Yoshihiro Yamazaki, Saki Mizuno, Naoki Makishima, Mana Ihori, Mihiro Uchida, Hiroshi Sato, Tomohiro Tanaka, Akihiko Takashima, Satoshi Suzuki, Shota Orihashi, Takafumi Moriya, Nobukatsu Hojo, Atsushi Ando, 
End-to-End Joint Modeling of Conversation History-Dependent and Independent ASR Systems with Multi-History Training.

Interspeech2022 Wataru Nakata, Tomoki Koriyama, Shinnosuke Takamichi, Yuki Saito, Yusuke Ijima, Ryo Masumura, Hiroshi Saruwatari, 
Predicting VQVAE-based Character Acting Style from Quotation-Annotated Text for Audiobook Speech Synthesis.

Interspeech2022 Fumio Nihei, Ryo Ishii, Yukiko I. Nakano, Kyosuke Nishida, Ryo Masumura, Atsushi Fukayama, Takao Nakamura, 
Dialogue Acts Aided Important Utterance Detection Based on Multiparty and Multimodal Information.

Interspeech2022 Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Takafumi Moriya, Naoki Makishima, Mana Ihori, Tomohiro Tanaka, Ryo Masumura
Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations.

Interspeech2022 Akihiko Takashima, Ryo Masumura, Atsushi Ando, Yoshihiro Yamazaki, Mihiro Uchida, Shota Orihashi, 
Interactive Co-Learning with Cross-Modal Transformer for Audio-Visual Emotion Recognition.

Interspeech2022 Tomohiro Tanaka, Ryo Masumura, Hiroshi Sato, Mana Ihori, Kohei Matsuura, Takanori Ashihara, Takafumi Moriya, 
Domain Adversarial Self-Supervised Speech Representation Learning for Improving Unknown Domain Downstream Tasks.

ICASSP2021 Atsushi Ando, Ryo Masumura, Hiroshi Sato, Takafumi Moriya, Takanori Ashihara, Yusuke Ijima, Tomoki Toda, 
Speech Emotion Recognition Based on Listener Adaptive Models.

ICASSP2021 Mana Ihori, Naoki Makishima, Tomohiro Tanaka, Akihiko Takashima, Shota Orihashi, Ryo Masumura
MAPGN: Masked Pointer-Generator Network for Sequence-to-Sequence Pre-Training.

ICASSP2021 Naoki Makishima, Mana Ihori, Akihiko Takashima, Tomohiro Tanaka, Shota Orihashi, Ryo Masumura
Audio-Visual Speech Separation Using Cross-Modal Correspondence Loss.

ICASSP2021 Ryo Masumura, Naoki Makishima, Mana Ihori, Akihiko Takashima, Tomohiro Tanaka, Shota Orihashi, 
Hierarchical Transformer-Based Large-Context End-To-End ASR with Large-Context Knowledge Distillation.

ICASSP2021 Takafumi Moriya, Takanori Ashihara, Tomohiro Tanaka, Tsubasa Ochiai, Hiroshi Sato, Atsushi Ando, Yusuke Ijima, Ryo Masumura, Yusuke Shinohara, 
Simpleflat: A Simple Whole-Network Pre-Training Approach for RNN Transducer-Based End-to-End Speech Recognition.

Interspeech2021 Mana Ihori, Naoki Makishima, Tomohiro Tanaka, Akihiko Takashima, Shota Orihashi, Ryo Masumura
Zero-Shot Joint Modeling of Multiple Spoken-Text-Style Conversion Tasks Using Switching Tokens.

Interspeech2021 Naoki Makishima, Mana Ihori, Tomohiro Tanaka, Akihiko Takashima, Shota Orihashi, Ryo Masumura
Enrollment-Less Training for Personalized Voice Activity Detection.

Interspeech2021 Ryo Masumura, Daiki Okamura, Naoki Makishima, Mana Ihori, Akihiko Takashima, Tomohiro Tanaka, Shota Orihashi, 
Unified Autoregressive Modeling for Joint End-to-End Multi-Talker Overlapped Speech Recognition and Speaker Attribute Estimation.

Interspeech2021 Takafumi Moriya, Tomohiro Tanaka, Takanori Ashihara, Tsubasa Ochiai, Hiroshi Sato, Atsushi Ando, Ryo Masumura, Marc Delcroix, Taichi Asami, 
Streaming End-to-End Speech Recognition for Hybrid RNN-T/Attention Architecture.

Interspeech2021 Tomohiro Tanaka, Ryo Masumura, Mana Ihori, Akihiko Takashima, Takafumi Moriya, Takanori Ashihara, Shota Orihashi, Naoki Makishima, 
Cross-Modal Transformer-Based Neural Correction Models for Automatic Speech Recognition.

Interspeech2021 Tomohiro Tanaka, Ryo Masumura, Mana Ihori, Akihiko Takashima, Shota Orihashi, Naoki Makishima, 
End-to-End Rich Transcription-Style Automatic Speech Recognition with Semi-Supervised Learning.

TASLP2020 Atsushi Ando, Ryo Masumura, Hosana Kamiyama, Satoshi Kobashikawa, Yushi Aono, Tomoki Toda, 
Customer Satisfaction Estimation in Contact Center Calls Based on a Hierarchical Multi-Task Model.

#54  | Hervé Bourlard | Google Scholar   DBLP
VenuesInterspeech: 20ICASSP: 11SpeechComm: 8TASLP: 5
Years2022: 22021: 52020: 72019: 62018: 82017: 42016: 12
ISCA Sectionspoken term detection: 2atypical speech analysis and detection: 1novel models and training methods for asr: 1openasr20 and low resource asr development: 1neural network training methods and architectures for asr: 1speech in health: 1multilingual and code-switched asr: 1speech and language analytics for medical applications: 1model training for asr: 1extracting information from audio: 1plenary talk: 1dereverberation: 1adjusting to speaker, accent, and domain: 1discriminative training for asr: 1multi-lingual models and adaptation for asr: 1decoding, system combination: 1speech analysis: 1speaker recognition: 1acoustic modeling with neural networks: 1
IEEE Keywordspeech recognition: 8medical disorders: 4medical signal processing: 4speech intelligibility: 4diseases: 3amyotrophic lateral sclerosis: 2parkinson’s disease: 2automatic speech recognition: 2cerebral palsy: 2speaker recognition: 2speech: 2dtw: 2weibull distribution: 2neurophysiology: 2svm: 2speech enhancement: 2patient diagnosis: 2deep neural network: 2probability: 2speech coding: 2hidden markov models: 2sparse coding: 2i vector: 2speaker diarization: 2audio recording: 2variational au toencoders: 1speech dereverberation: 1poisson distribution: 1monte carlo methods: 1matrix decomposition: 1non negative matrix factorization: 1expectation maximisation algorithm: 1reverberation: 1convolutional neural network: 1convolutional neural nets: 1matrix algebra: 1dysarthria: 1pairwise distance: 1supervised learning: 1self supervised pretraining: 1lfmmi: 1natural language processing: 1cross lingual adaptation: 1regression analysis: 1spectral modulation: 1spectral subspace: 1hearing impairment: 1svd: 1parametric sparsity: 1support vector machines: 1non parametric sparsity: 1patient treatment: 1medical diagnostic computing: 1parkinson's disease: 1entropy: 1medical signal detection: 1speech synthesis: 1tts: 1p estoi: 1multi genre speech recognition: 1semi supervised learning: 1incremental training: 1estoi: 1pathological speech intelligibility: 1stoi: 1cepstral analysis: 1super gaussianity: 1signal classification: 1statistical distributions: 1end to end lf mmi: 1multilingual asr: 1language adaptive training: 1ctc: 1sparse representation: 1subspace detection: 1query processing: 1dynamic programming: 1sparse recovery modeling: 1spoken term detection: 1subspace regularization: 1posterior probabilities: 1query by example: 1estimation theory: 1information transmission: 1motor speech disorders: 1correlation methods: 1speech production: 1speech perception: 1gaussian processes: 1untranscribed data: 1soft targets: 1mixture models: 1principle component analysis: 1audio databases: 1microphones: 1speaker linking: 1sensor fusion: 1ward clustering: 1gaussian mixture model (gmm): 1information bottleneck (ib): 1joint factor analysis (jfa): 1acoustic modeling: 1union of low dimensional subspaces: 1dictionary learning: 1fusion: 1ward: 1longitudinal: 1clustering: 1linking: 1television broadcasting: 1
Most Publications2014: 212016: 172004: 172002: 162012: 15

Affiliations
URLs

Interspeech2022 Cécile Fougeron, Nicolas Audibert, Ina Kodrasi, Parvaneh Janbakhshi, Michaela Pernon, Nathalie Lévêque, Stephanie Borel, Marina Laganaro, Hervé Bourlard, Frédéric Assal, 
Comparison of 5 methods for the evaluation of intelligibility in mild to moderate French dysarthric speech.

Interspeech2022 Selen Hande Kabil, Hervé Bourlard
From Undercomplete to Sparse Overcomplete Autoencoders to Improve LF-MMI based Speech Recognition.

ICASSP2021 Deepak Baby, Hervé Bourlard
Speech Dereverberation Using Variational Autoencoders.

ICASSP2021 Parvaneh Janbakhshi, Ina Kodrasi, Hervé Bourlard
Automatic Dysarthric Speech Detection Exploiting Pairwise Distance-Based Convolutional Neural Networks.

ICASSP2021 Apoorv Vyas, Srikanth R. Madikeri, Hervé Bourlard
Lattice-Free Mmi Adaptation of Self-Supervised Pretrained Acoustic Models.

Interspeech2021 Srikanth R. Madikeri, Petr Motlícek, Hervé Bourlard
Multitask Adaptation with Lattice-Free MMI for Multi-Genre Speech Recognition of Low Resource Languages.

Interspeech2021 Apoorv Vyas, Srikanth R. Madikeri, Hervé Bourlard
Comparing CTC and LFMMI for Out-of-Domain Adaptation of wav2vec 2.0 Acoustic Model.

SpeechComm2020 Pranay Dighe, Afsaneh Asaei, Hervé Bourlard
On quantifying the quality of acoustic models in hybrid DNN-HMM ASR.

TASLP2020 Parvaneh Janbakhshi, Ina Kodrasi, Hervé Bourlard
Automatic Pathological Speech Intelligibility Assessment Exploiting Subspace-Based Analyses.

TASLP2020 Ina Kodrasi, Hervé Bourlard
Spectro-Temporal Sparsity Characterization for Dysarthric Speech Detection.

ICASSP2020 Parvaneh Janbakhshi, Ina Kodrasi, Hervé Bourlard
Synthetic Speech References for Automatic Pathological Speech Intelligibility Assessment.

ICASSP2020 Banriskhem K. Khonglah, Srikanth R. Madikeri, Subhadeep Dey, Hervé Bourlard, Petr Motlícek, Jayadev Billa, 
Incremental Semi-Supervised Learning for Multi-Genre Speech Recognition.

Interspeech2020 Ina Kodrasi, Michaela Pernon, Marina Laganaro, Hervé Bourlard
Automatic Discrimination of Apraxia of Speech and Dysarthria Using a Minimalistic Set of Handcrafted Features.

Interspeech2020 Srikanth R. Madikeri, Banriskhem K. Khonglah, Sibo Tong, Petr Motlícek, Hervé Bourlard, Daniel Povey, 
Lattice-Free Maximum Mutual Information Training of Multilingual Speech Recognition Systems.

SpeechComm2019 Pranay Dighe, Afsaneh Asaei, Hervé Bourlard
Low-rank and sparse subspace modeling of speech for DNN based acoustic modeling.

ICASSP2019 Parvaneh Janbakhshi, Ina Kodrasi, Hervé Bourlard
Pathological Speech Intelligibility Assessment Based on the Short-time Objective Intelligibility Measure.

ICASSP2019 Ina Kodrasi, Hervé Bourlard
Super-gaussianity of Speech Spectral Coefficients as a Potential Biomarker for Dysarthric Speech Detection.

ICASSP2019 Sibo Tong, Philip N. Garner, Hervé Bourlard
An Investigation of Multilingual ASR Using End-to-end LF-MMI.

Interspeech2019 Parvaneh Janbakhshi, Ina Kodrasi, Hervé Bourlard
Spectral Subspace Analysis for Automatic Assessment of Pathological Speech Intelligibility.

Interspeech2019 Sibo Tong, Apoorv Vyas, Philip N. Garner, Hervé Bourlard
Unbiased Semi-Supervised LF-MMI Training Using Dropout.

#55  | Jan Cernocký | Google Scholar   DBLP
VenuesInterspeech: 29ICASSP: 14TASLP: 1
Years2022: 62021: 92020: 32019: 72018: 52017: 72016: 7
ISCA Sectionspeaker recognition and diarization: 2automatic speech recognition in air traffic management: 2speaker characterization and recognition: 2special session: 2self-supervised, semi-supervised, adaptation and data augmentation for asr: 1search/decoding algorithms for asr: 1robust speaker recognition: 1linguistic components in end-to-end asr: 1graph and end-to-end learning for speaker recognition: 1embedding and network architecture for speaker recognition: 1target speaker detection, localization and separation: 1sequence-to-sequence speech recognition: 1spoken term detection, confidence measure, and end-to-end speech recognition: 1zero-resource asr: 1speaker recognition: 1the 2019 automatic speaker verification spoofing and countermeasures challenge: 1topics in speech recognition: 1dereverberation: 1low resource speech recognition challenge for indian languages: 1neural network training strategies for asr: 1multi-lingual models and adaptation for asr: 1neural network acoustic models for asr: 1spoken documents, spoken understanding and semantic analysis: 1language recognition: 1robustness in speech processing: 1
IEEE Keywordspeaker recognition: 7speech recognition: 7bayes methods: 4acoustic unit discovery: 3hidden markov models: 3variational bayes: 3beamforming: 2speaker verification: 2natural language processing: 2joint training: 2pattern clustering: 2unsupervised learning: 2speaker diarization: 2optimisation: 2recurrent neural nets: 2i vector: 2cross domain: 1blind source separation: 1speech separation: 1dpccn: 1mixture remix: 1frequency domain analysis: 1unsupervised target speech extraction: 1time domain analysis: 1multi channel: 1array signal processing: 1speech enhancement: 1dataset: 1multisv: 1sequence to sequence: 1speech synthesis: 1self supervision: 1cycle consistency: 1language translation: 1spoken language translation: 1coupled de coding: 1end to end differentiable pipeline: 1asr objective: 1auxiliary loss: 1transformers: 1how2 dataset: 1hierarchical subspace model: 1clustering: 1on the fly data augmentation: 1speaker embedding: 1specaugment: 1convolutional neural nets: 1probability: 1linear discriminant analysis: 1hmm: 1dihard: 1x vector: 1softmax margin: 1sequence learning: 1discriminative training: 1attention models: 1beam search training: 1spatial filters: 1speaker adaptive neural network: 1speaker extraction: 1residual memory networks: 1lstm: 1rnn: 1automatic speech recognition: 1computational complexity: 1feedforward neural nets: 1nonparametric statistics: 1non parametric bayesian models: 1document handling: 1topic identification: 1audio signal processing: 1bayesian non parametric: 1gaussian processes: 1automatic speaker identification: 1mixture models: 1bottleneck features: 1deep neural networks: 1ssnn: 1sequence summary: 1adaptation: 1dnn: 1
Most Publications2021: 202019: 182022: 172018: 172016: 12

Affiliations
Brno University of Technology

ICASSP2022 Jiangyu Han, Yanhua Long, Lukás Burget, Jan Cernocký
DPCCN: Densely-Connected Pyramid Complex Convolutional Network for Robust Speech Separation and Extraction.

ICASSP2022 Ladislav Mosner, Oldrich Plchot, Lukás Burget, Jan Honza Cernocký
Multisv: Dataset for Far-Field Multi-Channel Speaker Verification.

Interspeech2022 Murali Karthick Baskar, Tim Herzig, Diana Nguyen, Mireia Díez, Tim Polzehl, Lukás Burget, Jan Cernocký
Speaker adaptation for Wav2vec2 based dysarthric ASR.

Interspeech2022 Martin Kocour, Katerina Zmolíková, Lucas Ondel, Jan Svec, Marc Delcroix, Tsubasa Ochiai, Lukás Burget, Jan Cernocký
Revisiting joint decoding based multi-talker speech recognition with DNN acoustic model.

Interspeech2022 Junyi Peng, Rongzhi Gu, Ladislav Mosner, Oldrich Plchot, Lukás Burget, Jan Cernocký
Learnable Sparse Filterbank for Speaker Verification.

Interspeech2022 Themos Stafylakis, Ladislav Mosner, Oldrich Plchot, Johan Rohdin, Anna Silnova, Lukás Burget, Jan Cernocký
Training speaker embedding extractors using multi-speaker audio with unknown speaker boundaries.

ICASSP2021 Murali Karthick Baskar, Lukás Burget, Shinji Watanabe 0001, Ramón Fernandez Astudillo, Jan Honza Cernocký
Eat: Enhanced ASR-TTS for Self-Supervised Speech Recognition.

ICASSP2021 Hari Krishna Vydana, Martin Karafiát, Katerina Zmolíková, Lukás Burget, Honza Cernocký
Jointly Trained Transformers Models for Spoken Language Translation.

ICASSP2021 Bolaji Yusuf, Lucas Ondel, Lukás Burget, Jan Cernocký, Murat Saraçlar, 
A Hierarchical Subspace Model for Language-Attuned Acoustic Unit Discovery.

Interspeech2021 Ekaterina Egorova, Hari Krishna Vydana, Lukás Burget, Jan Cernocký
Out-of-Vocabulary Words Detection with Attention and CTC Alignments in an End-to-End ASR System.

Interspeech2021 Martin Kocour, Karel Veselý, Alexander Blatt, Juan Zuluaga-Gomez, Igor Szöke, Jan Cernocký, Dietrich Klakow, Petr Motlícek, 
Boosting of Contextual Information in ASR for Air-Traffic Call-Sign Recognition.

Interspeech2021 Junyi Peng, Xiaoyang Qu, Rongzhi Gu, Jianzong Wang, Jing Xiao 0006, Lukás Burget, Jan Cernocký
Effective Phase Encoding for End-To-End Speaker Verification.

Interspeech2021 Junyi Peng, Xiaoyang Qu, Jianzong Wang, Rongzhi Gu, Jing Xiao 0006, Lukás Burget, Jan Cernocký
ICSpk: Interpretable Complex Speaker Embedding Extractor from Raw Waveform.

Interspeech2021 Igor Szöke, Santosh Kesiraju, Ondrej Novotný, Martin Kocour, Karel Veselý, Jan Cernocký
Detecting English Speech in the Air Traffic Control Voice Communication.

Interspeech2021 Katerina Zmolíková, Marc Delcroix, Desh Raj, Shinji Watanabe 0001, Jan Cernocký
Auxiliary Loss Function for Target Speech Extraction and Recognition with Weak Supervision Based on Speaker Characteristics.

TASLP2020 Mireia Díez, Lukás Burget, Federico Landini, Jan Cernocký
Analysis of Speaker Diarization Based on Bayesian HMM With Eigenvoice Priors.

ICASSP2020 Shuai Wang 0016, Johan Rohdin, Oldrich Plchot, Lukás Burget, Kai Yu 0004, Jan Cernocký
Investigation of Specaugment for Deep Speaker Embedding Learning.

ICASSP2020 Mireia Díez, Lukás Burget, Federico Landini, Shuai Wang 0016, Honza Cernocký
Optimizing Bayesian Hmm Based X-Vector Clustering for the Second Dihard Speech Diarization Challenge.

ICASSP2019 Murali Karthick Baskar, Lukás Burget, Shinji Watanabe 0001, Martin Karafiát, Takaaki Hori, Jan Honza Cernocký
Promising Accurate Prefix Boosting for Sequence-to-sequence ASR.

Interspeech2019 Murali Karthick Baskar, Shinji Watanabe 0001, Ramón Fernandez Astudillo, Takaaki Hori, Lukás Burget, Jan Cernocký
Semi-Supervised Sequence-to-Sequence ASR Using Unpaired Speech and Text.

#56  | Jonathan Le Roux | Google Scholar   DBLP
VenuesICASSP: 22Interspeech: 17TASLP: 3ACL: 1
Years2022: 52021: 82020: 82019: 82018: 62017: 42016: 4
ISCA Sectionspoken dialogue systems: 1source separation: 1self-supervision and semi-supervision for neural asr training: 1acoustic event detection and acoustic scene classification: 1novel neural network architectures for asr: 1streaming for asr/rnn transducers: 1asr neural network architectures: 1privacy and security in speech communication: 1diarization: 1end-to-end speech recognition: 1search methods for speech recognition: 1speech technologies for code-switching in multilingual communities: 1speech and audio source separation and scene analysis: 1spatial and phase cues for source separation and speech recognition: 1noise robust speech recognition: 1far-field speech processing: 1source separation and spatial audio: 1
IEEE Keywordspeech recognition: 13source separation: 7natural language processing: 5recurrent neural nets: 5speaker recognition: 5audio signal processing: 5deep clustering: 5cocktail party problem: 4end to end asr: 3graph theory: 3self training: 3speech separation: 3transformer: 3end to end: 3pattern clustering: 3pattern classification: 2wfst: 2ctc: 2iterative methods: 2pseudo labeling: 2end to end speech recognition: 2time frequency analysis: 2reverberation: 2automatic speech recognition: 2music: 2triggered attention: 2signal classification: 2chimera network: 2speaker independent speech separation: 2speech coding: 2speech enhancement: 2speaker independent multi talker speech separation: 2multi speaker overlapped speech: 1gtc: 1semi supervised learning (artificial intelligence): 1semi supervised learning: 1asr: 1gtc t: 1transducer: 1rnn t: 1supervised learning: 1regression analysis: 1speech dereverberation: 1blind deconvolution: 1rir estimation: 1deep learning (artificial intelligence): 1filtering theory: 1domain adaptation: 1self supervised asr: 1dropout: 1iterative pseudo labeling: 1language translation: 1computational complexity: 1dilated self attention: 1semi supervised asr: 1graph based temporal classification: 1probability: 1blind source separation: 1sound event detection: 1mask inference: 1weak supervision: 1decoding: 1neural beamforming: 1overlapped speech recognition: 1audio coding: 1streaming: 1semi supervised classification: 1weakly labeled data: 1audio source separation: 1unsupervised speaker adaptation: 1speaker memory: 1neural turing machine: 1turing machines: 1low latency: 1unpaired data: 1expert systems: 1cycle consistency: 1frame synchronous decoding: 1attention mechanism: 1end to end automatic speech recognition: 1connectionist temporal classification: 1signal to noise ratio: 1signal denoising: 1objective measure: 1computational linguistics: 1neural net architecture: 1hybrid attention/ctc: 1language identification: 1language independent architecture: 1multilingual asr: 1human computer interaction: 1spatial clustering: 1microphone arrays: 1hidden semi markov model (hsmm): 1polyphonic sound event detection (sed): 1hidden markov models: 1hybrid model: 1recurrent neural network: 1long short term memory (lstm): 1duration control: 1estimation theory: 1music separation: 1approximation theory: 1singing voice separation: 1chime 4: 1student teacher learning: 1distant talking asr: 1self supervised learning: 1multi access systems: 1distance learning: 1optimisation: 1embedding: 1clustering: 1multichannel gmm: 1gaussian processes: 1deep unfolding: 1markov random field: 1mixture models: 1markov processes: 1
Most Publications2021: 282022: 232019: 232020: 212018: 14

Affiliations
URLs

ICASSP2022 Xuankai Chang, Niko Moritz, Takaaki Hori, Shinji Watanabe 0001, Jonathan Le Roux
Extended Graph Temporal Classification for Multi-Speaker End-to-End ASR.

ICASSP2022 Yosuke Higuchi, Niko Moritz, Jonathan Le Roux, Takaaki Hori, 
Advancing Momentum Pseudo-Labeling with Conformer and Initialization Strategy.

ICASSP2022 Niko Moritz, Takaaki Hori, Shinji Watanabe 0001, Jonathan Le Roux
Sequence Transduction with Graph-Based Supervision.

Interspeech2022 Chiori Hori, Takaaki Hori, Jonathan Le Roux
Low-Latency Online Streaming VideoQA Using Audio-Visual Transformers.

Interspeech2022 Efthymios Tzinis, Gordon Wichern, Aswin Shanmugam Subramanian, Paris Smaragdis, Jonathan Le Roux
Heterogeneous Target Speech Separation.

TASLP2021 Zhong-Qiu Wang, Gordon Wichern, Jonathan Le Roux
Convolutive Prediction for Monaural Speech Dereverberation and Noisy-Reverberant Speaker Separation.

ICASSP2021 Sameer Khurana, Niko Moritz, Takaaki Hori, Jonathan Le Roux
Unsupervised Domain Adaptation for Speech Recognition via Uncertainty Driven Self-Training.

ICASSP2021 Niko Moritz, Takaaki Hori, Jonathan Le Roux
Capturing Multi-Resolution Context by Dilated Self-Attention.

ICASSP2021 Niko Moritz, Takaaki Hori, Jonathan Le Roux
Semi-Supervised Speech Recognition Via Graph-Based Temporal Classification.

Interspeech2021 Yosuke Higuchi, Niko Moritz, Jonathan Le Roux, Takaaki Hori, 
Momentum Pseudo-Labeling for Semi-Supervised Speech Recognition.

Interspeech2021 Chiori Hori, Takaaki Hori, Jonathan Le Roux
Optimizing Latency for Online Video Captioning Using Audio-Visual Transformers.

Interspeech2021 Takaaki Hori, Niko Moritz, Chiori Hori, Jonathan Le Roux
Advanced Long-Context End-to-End Speech Recognition Using Context-Expanded Transformers.

Interspeech2021 Niko Moritz, Takaaki Hori, Jonathan Le Roux
Dual Causal/Non-Causal Self-Attention for Streaming End-to-End Speech Recognition.

TASLP2020 Fatemeh Pishdadian, Gordon Wichern, Jonathan Le Roux
Finding Strength in Weakness: Learning to Separate Sounds With Weak Supervision.

ICASSP2020 Xuankai Chang, Wangyou Zhang, Yanmin Qian, Jonathan Le Roux, Shinji Watanabe 0001, 
End-To-End Multi-Speaker Speech Recognition With Transformer.

ICASSP2020 Niko Moritz, Takaaki Hori, Jonathan Le Roux
Streaming Automatic Speech Recognition with the Transformer Model.

ICASSP2020 Fatemeh Pishdadian, Gordon Wichern, Jonathan Le Roux
Learning to Separate Sounds from Weakly Labeled Scenes.

ICASSP2020 Leda Sari, Niko Moritz, Takaaki Hori, Jonathan Le Roux
Unsupervised Speaker Adaptation Using Attention-Based Speaker Memory for End-to-End ASR.

Interspeech2020 Takaaki Hori, Niko Moritz, Chiori Hori, Jonathan Le Roux
Transformer-Based Long-Context End-to-End Speech Recognition.

Interspeech2020 Tejas Jayashankar, Jonathan Le Roux, Pierre Moulin, 
Detecting Audio Attacks on ASR Systems with Dropout Uncertainty.

#57  | Hisashi Kawai | Google Scholar   DBLP
VenuesInterspeech: 24ICASSP: 12TASLP: 4SpeechComm: 2
Years2022: 22021: 62020: 42019: 102018: 92017: 22016: 9
ISCA Sectionspeech synthesis: 5speaker and language recognition: 1topics in asr: 1large-scale evaluation of short-duration speaker verification: 1cross-lingual and multilingual asr: 1asr for noisy and far-field speech: 1spoken term detection, confidence measure, and end-to-end speech recognition: 1nn architectures for asr: 1speech enhancement: 1speech and audio classification: 1acoustic modelling: 1audio events and acoustic scenes: 1text analysis, multilingual issues and evaluation in speech synthesis: 1language identification: 1prosody and text processing: 1speaker and language recognition applications: 1language modeling for conversational speech and confidence measures: 1decoding, system combination: 1language recognition: 1co-inference of production and acoustics: 1
IEEE Keywordspeech recognition: 7speech synthesis: 6speaker recognition: 4vocoders: 4neural vocoder: 3spoken language identification: 3knowledge distillation: 3pattern classification: 2autoregressive processes: 2speech intelligibility: 2speech enhancement: 2noise shaping: 2filtering theory: 2acoustic model: 2connectionist temporal classification: 2generative model: 1bayes methods: 1speaker verification: 1affine transforms: 1discriminative model: 1joint bayesian model: 1pitch dependent dilated convolution: 1parallel wavegan: 1quasi periodic wavenet: 1convolutional neural nets: 1optimal transport: 1unsupervised domain adaptation: 1statistical distributions: 1dysarthria: 1text to speech: 1voice conversion: 1medical disorders: 1probability: 1wavegrad: 1diffusion probabilistic vocoder: 1sub modeling: 1diffwave: 1noise: 1short utterances: 1internal representation learning: 1natural language processing: 1sequence to sequence model: 1transformer: 1weighted forced attention: 1forced alignment: 1gaussian inverse autoregressive flow: 1gaussian processes: 1fast fourier transforms: 1fftnet: 1parallel wavenet: 1interactive teacher student learning: 1computer aided instruction: 1teacher model optimization: 1short utterance feature representation: 1natural languages: 1optimisation: 1end to end speech enhancement: 1mean square error methods: 1fully convolutional neural network: 1automatic speech recognition: 1raw waveform: 1speech coding: 1entire audible frequency range: 1multirate signal processing: 1graphics processing units: 1vocoder: 1subband wavenet: 1sampling methods: 1perceptual weighting: 1wavenet: 1white noise: 1quantisation (signal): 1convolution: 1noise analysis: 1feedforward neural nets: 1recurrent neural nets: 1long short term memory: 1hidden markov models: 1signal classification: 1conditional entropy: 1entropy: 1loss function: 1linear transformation network: 1deep neural network: 1singular value decomposition: 1speaker adaptive training: 1lfda: 1discriminant analysis: 1language identification: 1i vector: 1
Most Publications2010: 252019: 212021: 152018: 152011: 15

Affiliations
URLs

SpeechComm2022 Takuma Okamoto, Keisuke Matsubara, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai
Neural speech-rate conversion with multispeaker WaveNet vocoder.

Interspeech2022 Peng Shen, Xugang Lu, Hisashi Kawai
Transducer-based language embedding for spoken language identification.

TASLP2021 Xugang Lu, Peng Shen, Yu Tsao 0001, Hisashi Kawai
Coupling a Generative Model With a Discriminative Learning Framework for Speaker Verification.

TASLP2021 Yi-Chiao Wu, Tomoki Hayashi, Takuma Okamoto, Hisashi Kawai, Tomoki Toda, 
Quasi-Periodic Parallel WaveGAN: A Non-Autoregressive Raw Waveform Generative Model With Pitch-Dependent Dilated Convolution Neural Network.

ICASSP2021 Xugang Lu, Peng Shen, Yu Tsao 0001, Hisashi Kawai
Unsupervised Neural Adaptation Model Based on Optimal Transport for Spoken Language Identification.

ICASSP2021 Keisuke Matsubara, Takuma Okamoto, Ryoichi Takashima, Tetsuya Takiguchi, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai
High-Intelligibility Speech Synthesis for Dysarthric Speakers with LPCNet-Based TTS and CycleVAE-Based VC.

ICASSP2021 Takuma Okamoto, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai
Noise Level Limited Sub-Modeling for Diffusion Probabilistic Vocoders.

Interspeech2021 Masakiyo Fujimoto, Hisashi Kawai
Noise Robust Acoustic Modeling for Single-Channel Speech Recognition Based on a Stream-Wise Transformer Architecture.

TASLP2020 Peng Shen, Xugang Lu, Sheng Li 0010, Hisashi Kawai
Knowledge Distillation-Based Representation Learning for Short-Utterance Spoken Language Identification.

ICASSP2020 Takuma Okamoto, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai
Transformer-Based Text-to-Speech with Weighted Forced Attention.

Interspeech2020 Peng Shen, Xugang Lu, Hisashi Kawai
Investigation of NICT Submission for Short-Duration Speaker Verification Challenge 2020.

Interspeech2020 Yi-Chiao Wu, Tomoki Hayashi, Takuma Okamoto, Hisashi Kawai, Tomoki Toda, 
Quasi-Periodic Parallel WaveGAN Vocoder: A Non-Autoregressive Pitch-Dependent Dilated Convolution Model for Parametric Speech Generation.

ICASSP2019 Takuma Okamoto, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai
Investigations of Real-time Gaussian Fftnet and Parallel Wavenet Neural Vocoders with Simple Acoustic Features.

ICASSP2019 Peng Shen, Xugang Lu, Sheng Li 0010, Hisashi Kawai
Interactive Learning of Teacher-student Model for Short Utterance Spoken Language Identification.

Interspeech2019 Sheng Li 0010, Chenchen Ding, Xugang Lu, Peng Shen, Tatsuya Kawahara, Hisashi Kawai
End-to-End Articulatory Attribute Modeling for Low-Resource Multilingual Speech Recognition.

Interspeech2019 Masakiyo Fujimoto, Hisashi Kawai
One-Pass Single-Channel Noisy Speech Recognition Using a Combination of Noisy and Enhanced Features.

Interspeech2019 Sheng Li 0010, Xugang Lu, Chenchen Ding, Peng Shen, Tatsuya Kawahara, Hisashi Kawai
Investigating Radical-Based End-to-End Speech Recognition Systems for Chinese Dialects and Japanese.

Interspeech2019 Sheng Li 0010, Raj Dabre, Xugang Lu, Peng Shen, Tatsuya Kawahara, Hisashi Kawai
Improving Transformer-Based Speech Recognition Systems with Compressed Structure and Speech Attributes Augmentation.

Interspeech2019 Chien-Feng Liao, Yu Tsao 0001, Xugang Lu, Hisashi Kawai
Incorporating Symbolic Sequential Modeling for Speech Enhancement.

Interspeech2019 Xugang Lu, Peng Shen, Sheng Li 0010, Yu Tsao 0001, Hisashi Kawai
Class-Wise Centroid Distance Metric Learning for Acoustic Event Detection.

#58  | Takaaki Hori | Google Scholar   DBLP
VenuesICASSP: 19Interspeech: 16TASLP: 3ACL: 2SpeechComm: 1ICML: 1
Years2022: 42021: 72020: 52019: 112018: 52017: 72016: 3
ISCA Sectionspoken dialogue systems: 1self-supervision and semi-supervision for neural asr training: 1acoustic event detection and acoustic scene classification: 1novel neural network architectures for asr: 1streaming for asr/rnn transducers: 1asr neural network architectures: 1diarization: 1sequence-to-sequence speech recognition: 1emotion and personality in conversation: 1spoken term detection, confidence measure, and end-to-end speech recognition: 1end-to-end speech recognition: 1search methods for speech recognition: 1speech technologies for code-switching in multilingual communities: 1recurrent neural models for asr: 1neural network acoustic models for asr: 1spoken language understanding systems: 1
IEEE Keywordspeech recognition: 20natural language processing: 6recurrent neural nets: 5end to end speech recognition: 4end to end asr: 3graph theory: 3self training: 3automatic speech recognition: 3speech coding: 3connectionist temporal classification: 3end to end: 3pattern classification: 2wfst: 2ctc: 2pseudo labeling: 2transformer: 2joint ctc/attention: 2microphone arrays: 2triggered attention: 2speaker recognition: 2speech enhancement: 2hidden markov models: 2probability: 2multi speaker overlapped speech: 1gtc: 1iterative methods: 1semi supervised learning (artificial intelligence): 1semi supervised learning: 1asr: 1gtc t: 1transducer: 1rnn t: 1domain adaptation: 1self supervised asr: 1dropout: 1iterative pseudo labeling: 1language translation: 1computational complexity: 1dilated self attention: 1semi supervised asr: 1graph based temporal classification: 1multi encoder multi array (mem array): 1encoder decoder: 1multi encoder multi resolution (mem res): 1encoding: 1hierarchical attention network (han): 1audio coding: 1streaming: 1unsupervised speaker adaptation: 1speaker memory: 1neural turing machine: 1turing machines: 1optimisation: 1softmax margin: 1sequence learning: 1discriminative training: 1attention models: 1beam search training: 1cold fusion: 1storage management: 1deep fusion: 1sequence to sequence: 1shallow fusion: 1language model: 1automatic speech recognition (asr): 1unpaired data: 1expert systems: 1cycle consistency: 1frame synchronous decoding: 1attention mechanism: 1end to end automatic speech recognition: 1signal classification: 1decoding: 1multiple microphone array: 1error statistics: 1speech codecs: 1stream attention: 1multichannel end to end asr: 1speaker adaptation: 1attention based encoder decoder: 1neural beamformer: 1computational linguistics: 1neural net architecture: 1hybrid attention/ctc: 1language identification: 1language independent architecture: 1multilingual asr: 1speaker independent multi talker speech separation: 1human computer interaction: 1cocktail party problem: 1deep clustering: 1hidden semi markov model (hsmm): 1polyphonic sound event detection (sed): 1hybrid model: 1recurrent neural network: 1long short term memory (lstm): 1duration control: 1multi task learning: 1speech synthesis: 1attention: 1chime 4: 1student teacher learning: 1distant talking asr: 1self supervised learning: 1multi access systems: 1distance learning: 1estimation theory: 1conditional random fields: 1word alignment network: 1correlation methods: 1error type classification: 1recognition accuracy estimation: 1maximum likelihood estimation: 1long short term memory: 1recurrent neural network language model: 1entropy: 1minimum word error training: 1
Most Publications2018: 232019: 202017: 202021: 152012: 14

Affiliations
URLs

ICASSP2022 Xuankai Chang, Niko Moritz, Takaaki Hori, Shinji Watanabe 0001, Jonathan Le Roux, 
Extended Graph Temporal Classification for Multi-Speaker End-to-End ASR.

ICASSP2022 Yosuke Higuchi, Niko Moritz, Jonathan Le Roux, Takaaki Hori
Advancing Momentum Pseudo-Labeling with Conformer and Initialization Strategy.

ICASSP2022 Niko Moritz, Takaaki Hori, Shinji Watanabe 0001, Jonathan Le Roux, 
Sequence Transduction with Graph-Based Supervision.

Interspeech2022 Chiori Hori, Takaaki Hori, Jonathan Le Roux, 
Low-Latency Online Streaming VideoQA Using Audio-Visual Transformers.

ICASSP2021 Sameer Khurana, Niko Moritz, Takaaki Hori, Jonathan Le Roux, 
Unsupervised Domain Adaptation for Speech Recognition via Uncertainty Driven Self-Training.

ICASSP2021 Niko Moritz, Takaaki Hori, Jonathan Le Roux, 
Capturing Multi-Resolution Context by Dilated Self-Attention.

ICASSP2021 Niko Moritz, Takaaki Hori, Jonathan Le Roux, 
Semi-Supervised Speech Recognition Via Graph-Based Temporal Classification.

Interspeech2021 Yosuke Higuchi, Niko Moritz, Jonathan Le Roux, Takaaki Hori
Momentum Pseudo-Labeling for Semi-Supervised Speech Recognition.

Interspeech2021 Chiori Hori, Takaaki Hori, Jonathan Le Roux, 
Optimizing Latency for Online Video Captioning Using Audio-Visual Transformers.

Interspeech2021 Takaaki Hori, Niko Moritz, Chiori Hori, Jonathan Le Roux, 
Advanced Long-Context End-to-End Speech Recognition Using Context-Expanded Transformers.

Interspeech2021 Niko Moritz, Takaaki Hori, Jonathan Le Roux, 
Dual Causal/Non-Causal Self-Attention for Streaming End-to-End Speech Recognition.

TASLP2020 Ruizhi Li, Xiaofei Wang 0007, Sri Harish Mallidi, Shinji Watanabe 0001, Takaaki Hori, Hynek Hermansky, 
Multi-Stream End-to-End Speech Recognition.

ICASSP2020 Niko Moritz, Takaaki Hori, Jonathan Le Roux, 
Streaming Automatic Speech Recognition with the Transformer Model.

ICASSP2020 Leda Sari, Niko Moritz, Takaaki Hori, Jonathan Le Roux, 
Unsupervised Speaker Adaptation Using Attention-Based Speaker Memory for End-to-End ASR.

Interspeech2020 Takaaki Hori, Niko Moritz, Chiori Hori, Jonathan Le Roux, 
Transformer-Based Long-Context End-to-End Speech Recognition.

Interspeech2020 Niko Moritz, Gordon Wichern, Takaaki Hori, Jonathan Le Roux, 
All-in-One Transformer: Unifying Speech Recognition, Audio Tagging, and Event Detection.

ICASSP2019 Murali Karthick Baskar, Lukás Burget, Shinji Watanabe 0001, Martin Karafiát, Takaaki Hori, Jan Honza Cernocký, 
Promising Accurate Prefix Boosting for Sequence-to-sequence ASR.

ICASSP2019 Jaejin Cho, Shinji Watanabe 0001, Takaaki Hori, Murali Karthick Baskar, Hirofumi Inaguma, Jesús Villalba 0001, Najim Dehak, 
Language Model Integration Based on Memory Control for Sequence to Sequence Speech Recognition.

ICASSP2019 Takaaki Hori, Ramón Fernandez Astudillo, Tomoki Hayashi, Yu Zhang 0033, Shinji Watanabe 0001, Jonathan Le Roux, 
Cycle-consistency Training for End-to-end Speech Recognition.

ICASSP2019 Niko Moritz, Takaaki Hori, Jonathan Le Roux, 
Triggered Attention for End-to-end Speech Recognition.

#59  | Kong-Aik Lee | Google Scholar   DBLP
VenuesInterspeech: 25ICASSP: 12TASLP: 4SpeechComm: 1
Years2022: 62021: 52020: 92019: 62018: 52017: 72016: 4
ISCA Sectionrobust speaker recognition: 3speaker recognition evaluation: 2speaker recognition: 2speaker recognition and diarization: 2speaker verification: 2special session: 2short utterances speaker recognition: 2voice anti-spoofing and countermeasure: 1feature, embedding and neural architecture for speaker recognition: 1anti-spoofing and liveness detection: 1speaker recognition challenges and applications: 1the attacker’s perpective on automatic speaker verification: 1large-scale evaluation of short-duration speaker verification: 1dnn architectures for speaker recognition: 1learning techniques for speaker recognition: 1the 2019 automatic speaker verification spoofing and countermeasures challenge: 1language recognition: 1
IEEE Keywordspeaker recognition: 13speaker verification: 5natural language processing: 4probability: 4speech recognition: 3pattern classification: 2unsupervised learning: 2meta learning: 2optimisation: 2security of data: 2spoken language recognition: 2domain adaptation: 2gaussian processes: 2multi scale frequency channel attention: 1convolutional neural nets: 1text independent speaker verification: 1short utterance: 1self supervised speaker recognition: 1pseudo label selection: 1loss gated learning: 1multi speaker asr: 1alimeeting: 1speaker diarization: 1m2met: 1meeting transcription: 1microphone arrays: 1domain invariant: 1multilayer perceptrons: 1meta generalized transformation: 1cross channel: 1meta speaker embedding network: 1spoofing counter measures: 1automatic speaker verification (asv): 1presentation attack detection: 1detect ion cost function: 1backpropagation: 1convolutional recurrent neural network: 1deep bottleneck features: 1speech articulatory attributes: 1maximal figure of merit: 1correlation alignment: 1regularization: 1interpolation: 1speak verification: 1generalized framework: 1correlation methods: 1linear discriminant analysis: 1unsupervised: 1discriminant analysis: 1rapid computation: 1total variability model: 1covariance matrices: 1i vector: 1automatic speaker verification: 1normal distribution: 1short duration speaker verification: 1phonetic variability: 1speaker phonetic vector: 1parameter estimation: 1fusion: 1analytic phase: 1cepstral analysis: 1long time features: 1instantaneous frequency: 1factor analysis: 1language identification: 1discriminative training: 1plda: 1language detection: 1estimation theory: 1channel adaptation: 1channel prior estimation: 1probabilistic linear discriminant analysis: 1text analysis: 1multi source speaker verification: 1replay: 1spoofing: 1short duration utterance: 1content aware local variability: 1
Most Publications2021: 272022: 262020: 242019: 152018: 14

Affiliations
Institute for Infocomm Research, Singapore
URLs

SpeechComm2022 Hongning Zhu, Kong Aik Lee, Haizhou Li 0001, 
Discriminative speaker embedding with serialized multi-layer multi-head attention.

ICASSP2022 Tianchi Liu 0004, Rohan Kumar Das, Kong Aik Lee, Haizhou Li 0001, 
MFA: TDNN with Multi-Scale Frequency-Channel Attention for Text-Independent Speaker Verification with Short Utterances.

ICASSP2022 Ruijie Tao, Kong Aik Lee, Rohan Kumar Das, Ville Hautamäki, Haizhou Li 0001, 
Self-Supervised Speaker Recognition with Loss-Gated Learning.

ICASSP2022 Fan Yu, Shiliang Zhang, Pengcheng Guo, Yihui Fu, Zhihao Du, Siqi Zheng, Weilong Huang, Lei Xie 0001, Zheng-Hua Tan, DeLiang Wang, Yanmin Qian, Kong Aik Lee, Zhijie Yan, Bin Ma 0001, Xin Xu, Hui Bu, 
Summary on the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge.

ICASSP2022 Hanyi Zhang, Longbiao Wang, Kong Aik Lee, Meng Liu, Jianwu Dang 0001, Hui Chen, 
Learning Domain-Invariant Transformation for Speaker Verification.

Interspeech2022 Qiongqiong Wang, Kong Aik Lee, Tianchi Liu 0004, 
Scoring of Large-Margin Embeddings for Speaker Verification: Cosine or PLDA?

ICASSP2021 Hanyi Zhang, Longbiao Wang, Kong Aik Lee, Meng Liu, Jianwu Dang 0001, Hui Chen, 
Meta-Learning for Cross-Channel Speaker Verification.

Interspeech2021 Tomi Kinnunen, Andreas Nautsch, Md. Sahidullah, Nicholas W. D. Evans, Xin Wang 0037, Massimiliano Todisco, Héctor Delgado, Junichi Yamagishi, Kong Aik Lee
Visualizing Classifier Adjacency Relations: A Case Study in Speaker Verification and Voice Anti-Spoofing.

Interspeech2021 Yibo Wu, Longbiao Wang, Kong Aik Lee, Meng Liu, Jianwu Dang 0001, 
Joint Feature Enhancement and Speaker Recognition with Multi-Objective Task-Oriented Network.

Interspeech2021 Li Zhang 0084, Qing Wang 0039, Kong Aik Lee, Lei Xie 0001, Haizhou Li 0001, 
Multi-Level Transfer Learning from Near-Field to Far-Field Speaker Verification.

Interspeech2021 Hongning Zhu, Kong Aik Lee, Haizhou Li 0001, 
Serialized Multi-Layer Multi-Head Attention for Neural Speaker Embedding.

TASLP2020 Tomi Kinnunen, Héctor Delgado, Nicholas W. D. Evans, Kong Aik Lee, Ville Vestman, Andreas Nautsch, Massimiliano Todisco, Xin Wang 0037, Md. Sahidullah, Junichi Yamagishi, Douglas A. Reynolds, 
Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification: Fundamentals.

TASLP2020 Ivan Kukanov, Trung Ngo Trong, Ville Hautamäki, Sabato Marco Siniscalchi, Valerio Mario Salerno, Kong Aik Lee
Maximal Figure-of-Merit Framework to Detect Multi-Label Phonetic Features for Spoken Language Recognition.

ICASSP2020 Qiongqiong Wang, Koji Okabe, Kong Aik Lee, Takafumi Koshinaka, 
A Generalized Framework for Domain Adaptation of PLDA in Speaker Recognition.

Interspeech2020 Kosuke Akimoto, Seng Pei Liew, Sakiko Mishima, Ryo Mizushima, Kong Aik Lee
POCO: A Voice Spoofing and Liveness Detection Corpus Based on Pop Noise.

Interspeech2020 Kong Aik Lee, Koji Okabe, Hitoshi Yamamoto, Qiongqiong Wang, Ling Guo, Takafumi Koshinaka, Jiacen Zhang, Keisuke Ishikawa, Koichi Shinoda, 
NEC-TT Speaker Verification System for SRE'19 CTS Challenge.

Interspeech2020 Alexey Sholokhov, Tomi Kinnunen, Ville Vestman, Kong Aik Lee
Extrapolating False Alarm Rates in Automatic Speaker Verification.

Interspeech2020 Hossein Zeinali, Kong Aik Lee, Jahangir Alam, Lukás Burget, 
SdSV Challenge 2020: Large-Scale Evaluation of Short-Duration Speaker Verification.

Interspeech2020 Hanyi Zhang, Longbiao Wang, Yunchun Zhang, Meng Liu, Kong Aik Lee, Jianguo Wei, 
Adversarial Separation Network for Speaker Recognition.

Interspeech2020 Dao Zhou, Longbiao Wang, Kong Aik Lee, Yibo Wu, Meng Liu, Jianwu Dang 0001, Jianguo Wei, 
Dynamic Margin Softmax Loss for Speaker Verification.

#60  | Elmar Nöth | Google Scholar   DBLP
VenuesInterspeech: 30ICASSP: 8SpeechComm: 3
Years2023: 12022: 72021: 72020: 42019: 82018: 42017: 62016: 4
ISCA Sectionspeech and language analytics for medical applications: 2pathological speech and language: 2special session: 2atypical speech detection: 1acoustic signal representation and analysis: 1self-supervised, semi-supervised, adaptation and data augmentation for asr: 1technology for disordered speech: 1speech and language in health: 1show and tell: 1miscellaneous topics in speech, voice and hearing disorders: 1speech and audio analysis: 1voice quality characterization for clinical voice assessment: 1the interspeech 2021 computational paralinguistics challenge (compare): 1the adresso challenge: 1disordered speech: 1noise reduction and intelligibility: 1the interspeech 2020 computational paralinguistics challenge (compare): 1speech perception in adverse listening conditions: 1speech and audio classification: 1the interspeech 2019 computational paralinguistics challenge (compare): 1applications in language learning and healthcare: 1social signals detection and speaker traits analysis: 1speech and language analytics for mental health: 1automatic detection and recognition of voice and speech disorders: 1voice, speech and hearing disorders: 1disorders related to speech and language: 1language modeling for conversational speech and confidence measures: 1
IEEE Keyworddiseases: 7medical signal processing: 5neurophysiology: 3parkinson's disease: 3natural language processing: 2alzheimer’s disease: 2medical diagnostic computing: 2gait analysis: 2speech analysis: 2parkinson’s disease: 2patient treatment: 2speech recognition: 2patient monitoring: 2updrs: 2acoustic analysis: 1medical disorders: 1linguistic analysis: 1psen1–e280a: 1smartphones: 1deep learning (artificial intelligence): 1smart phones: 1gaussian processes: 1handwriting analysis: 1ivectors: 1mixture models: 1gmm ubm: 1speaker recognition: 1recurrent neural nets: 1language models: 1neural network language models: 1automatic diagnosis: 1long short term memory: 1speech enhancement: 1speech impairments: 1mobile handsets: 1mobile devices: 1classification: 1signal classification: 1phonological features: 1speech synthesis: 1non modal phonation: 1phonological vocoding: 1gcca: 1multi view learning: 1handwriting processing: 1frenchay dysarthria assessment: 1gait processing: 1articulation: 1speech: 1intelligibility: 1
Most Publications2022: 282019: 212015: 192009: 182021: 15


SpeechComm2023 Paula Andrea Pérez-Toro, Tomás Arias-Vergara, Philipp Klumpp, Juan Camilo Vásquez-Correa, Maria Schuster, Elmar Nöth, Juan Rafael Orozco-Arroyave, 
Depression assessment in people with Parkinson's disease: The combination of acoustic features and natural language processing.

Interspeech2022 Sebastian Peter Bayerl, Dominik Wagner, Elmar Nöth, Korbinian Riedhammer, 
Detecting Dysfluencies in Stuttering Therapy Using wav2vec 2.0.

Interspeech2022 Christian Bergler, Alexander Barnhill, Dominik Perrin, Manuel Schmitt, Andreas K. Maier, Elmar Nöth
ORCA-WHISPER: An Automatic Killer Whale Sound Type Generation Toolkit Using Deep Learning.

Interspeech2022 Teena tom Dieck, Paula Andrea Pérez-Toro, Tomas Arias, Elmar Nöth, Philipp Klumpp, 
Wav2vec behind the Scenes: How end2end Models learn Phonetics.

Interspeech2022 Abner Hernandez, Paula Andrea Pérez-Toro, Elmar Nöth, Juan Rafael Orozco-Arroyave, Andreas K. Maier, Seung Hee Yang, 
Cross-lingual Self-Supervised Speech Representations for Improved Dysarthric Speech Recognition.

Interspeech2022 Paula Andrea Pérez-Toro, Philipp Klumpp, Abner Hernandez, Tomas Arias, Patricia Lillo, Andrea Slachevsky, Adolfo Martín García, Maria Schuster, Andreas K. Maier, Elmar Nöth, Juan Rafael Orozco-Arroyave, 
Alzheimer's Detection from English to Spanish Using Acoustic and Linguistic Embeddings.

Interspeech2022 P. Schäfer, Paula Andrea Pérez-Toro, Philipp Klumpp, Juan Rafael Orozco-Arroyave, Elmar Nöth, Andreas K. Maier, A. Abad, Maria Schuster, Tomás Arias-Vergara, 
CoachLea: an Android Application to Evaluate the Speech Production and Perception of Children with Hearing Loss.

Interspeech2022 Tobias Weise, Philipp Klumpp, Andreas K. Maier, Elmar Nöth, Björn Heismann, Maria Schuster, Seung Hee Yang, 
Disentangled Latent Speech Representation for Automatic Pathological Intelligibility Assessment.

ICASSP2021 Paula Andrea Pérez-Toro, Juan Camilo Vásquez-Correa, Tomas Arias-Vergara, Philipp Klumpp, M. Sierra-Castrillón, M. E. Roldán-López, D. Aguillón, L. Hincapié-Henao, Carlos Andrés Tobón-Quintero, Tobias Bocklet, Maria Schuster, Juan Rafael Orozco-Arroyave, Elmar Nöth
Acoustic and Linguistic Analyses to Assess Early-Onset and Genetic Alzheimer's Disease.

ICASSP2021 Juan Camilo Vásquez-Correa, Tomas Arias-Vergara, Philipp Klumpp, Paula Andrea Pérez-Toro, Juan Rafael Orozco-Arroyave, Elmar Nöth
End-2-End Modeling of Speech and Gait from Patients with Parkinson's Disease: Comparison Between High Quality Vs. Smartphone Data.

Interspeech2021 Christian Bergler, Manuel Schmitt, Andreas K. Maier, Helena Symonds, Paul Spong, Steven R. Ness, George Tzanetakis, Elmar Nöth
ORCA-SLANG: An Automatic Multi-Stage Semi-Supervised Deep Learning Framework for Large-Scale Killer Whale Call Type Identification.

Interspeech2021 Carlos A. Ferrer, Efren Aragón, María E. Hdez-Díaz, Marc S. De Bodt, Roman Cmejla, Marina Englert, Mara Behlau, Elmar Nöth
Modeling Dysphonia Severity as a Function of Roughness and Breathiness Ratings in the GRBAS Scale.

Interspeech2021 Philipp Klumpp, Tobias Bocklet, Tomas Arias-Vergara, Juan Camilo Vásquez-Correa, Paula Andrea Pérez-Toro, Sebastian P. Bayerl, Juan Rafael Orozco-Arroyave, Elmar Nöth
The Phonetic Footprint of Covid-19?

Interspeech2021 Paula Andrea Pérez-Toro, Sebastian P. Bayerl, Tomas Arias-Vergara, Juan Camilo Vásquez-Correa, Philipp Klumpp, Maria Schuster, Elmar Nöth, Juan Rafael Orozco-Arroyave, Korbinian Riedhammer, 
Influence of the Interviewer on the Automatic Assessment of Alzheimer's Disease in the Context of the ADReSSo Challenge.

Interspeech2021 Juan Camilo Vásquez-Correa, Julian Fritsch, Juan Rafael Orozco-Arroyave, Elmar Nöth, Mathew Magimai-Doss, 
On Modeling Glottal Source Information for Phonation Assessment in Parkinson's Disease.

SpeechComm2020 Juan Camilo Vásquez-Correa, Tomas Arias-Vergara, Maria Schuster, Juan Rafael Orozco-Arroyave, Elmar Nöth
Parallel Representation Learning for the Classification of Pathological Speech: Studies on Parkinson's Disease and Cleft Lip and Palate.

ICASSP2020 Juan Camilo Vásquez-Correa, Tobias Bocklet, Juan Rafael Orozco-Arroyave, Elmar Nöth
Comparison of User Models Based on GMM-UBM and I-Vectors for Speech, Handwriting, and Gait Assessment of Parkinson's Disease Patients.

Interspeech2020 Christian Bergler, Manuel Schmitt, Andreas Maier 0001, Simeon Smeele, Volker Barth, Elmar Nöth
ORCA-CLEAN: A Deep Denoising Toolkit for Killer Whale Communication.

Interspeech2020 Philipp Klumpp, Tomás Arias-Vergara, Juan Camilo Vásquez-Correa, Paula Andrea Pérez-Toro, Florian Hönig, Elmar Nöth, Juan Rafael Orozco-Arroyave, 
Surgical Mask Detection with Deep Recurrent Phonetic Models.

ICASSP2019 Julian Fritsch, Sebastian Wankerl, Elmar Nöth
Automatic Diagnosis of Alzheimer's Disease Using Neural Network Language Models.

#61  | Florian Metze | Google Scholar   DBLP
VenuesInterspeech: 22ICASSP: 14TASLP: 1SpeechComm: 1NeurIPS: 1ACL: 1NAACL: 1
Years2022: 32021: 52020: 62019: 102018: 72017: 42016: 6
ISCA Sectioncross/multi-lingual and code-switched asr: 2special session: 2acoustic event detection and classification: 1low-resource asr development: 1spoken language understanding: 1spoken language processing: 1asr neural network architectures: 1multilingual and code-switched asr: 1nn architectures for asr: 1cross-lingual and multilingual asr: 1speech annotation and labelling: 1multimodal asr: 1speaker diarization: 1acoustic scenes and rare events: 1audio events and acoustic scenes: 1asr systems and technologies: 1music and audio processing: 1search, computational strategies and language modeling: 1new products and services: 1acoustic modeling with neural networks: 1
IEEE Keywordspeech recognition: 11natural language processing: 6text analysis: 3automatic speech recognition: 2image retrieval: 2diarization: 2machine translation: 2speaker recognition: 2video signal processing: 2recurrent neural nets: 2long sequence modeling: 1speech summarization: 1concept learning: 1end to end: 1multilingual phonetic dataset: 1multilingual speech alignment: 1low resource speech recognition: 1human computer interaction: 1image representation: 1speech synthesis: 1unsupervised learning: 1universal phone recognition: 1multilingual speech recognition: 1phonology: 1computational linguistics: 1domain adaptation: 1language translation: 1asr error correction: 1medical transcription: 1robustness: 1noisy asr: 1multimodal learning: 1multimodal asr: 1error statistics: 1ctc based decoding: 1multilingual language models: 1low resource asr: 1phoneme level language models: 1adaptation: 1audiovisual speech recognition: 1connectionist temporal classification: 1sequence to sequence model: 1overlap detection: 1speech enhancement: 1multi modal data: 1unwritten languages: 1unsupervised unit discovery: 1linguistics: 1multimedia computing: 1multimodal processing: 1word processing: 1ubiquitous computing: 1audio visual speech recognition: 1sequences: 1connectionist temporal classification (ctc): 1pattern classification: 1sound event detection (sed): 1recurrent neural networks (rnn): 1audio signal processing: 1lstms: 1rnns: 1acoustic modeling: 1decision trees: 1signal classification: 1ctc: 1noisemes: 1long short term memory (lstm): 1multimedia event detection (med): 1recurrent neural networks (rnns): 1multimedia systems: 1
Most Publications2019: 392018: 312021: 282020: 282014: 24

Affiliations
Carnegie Mellon University, Pittsburgh, USA

ICASSP2022 Roshan Sharma, Shruti Palaskar, Alan W. Black, Florian Metze
End-to-End Speech Summarization Using Restricted Self-Attention.

Interspeech2022 Juncheng Li 0001, Shuhui Qu, Po-Yao Huang 0001, Florian Metze
AudioTagging Done Right: 2nd comparison of deep learning methods for environmental sound classification.

Interspeech2022 Xinjian Li, Florian Metze, David R. Mortensen, Alan W. Black, Shinji Watanabe 0001, 
ASR2K: Speech Recognition for Around 2000 Languages without Audio.

ICASSP2021 Xinjian Li, David R. Mortensen, Florian Metze, Alan W. Black, 
Multilingual Phonetic Dataset for Low Resource Speech Recognition.

Interspeech2021 Siddhant Arora, Alissa Ostapenko, Vijay Viswanathan 0002, Siddharth Dalmia, Florian Metze, Shinji Watanabe 0001, Alan W. Black, 
Rethinking End-to-End Evaluation of Decomposable Tasks: A Case Study on Spoken Language Understanding.

Interspeech2021 Xinjian Li, Juncheng Li 0001, Florian Metze, Alan W. Black, 
Hierarchical Phone Recognition with Compositional Phonetics.

Interspeech2021 Shruti Palaskar, Ruslan Salakhutdinov, Alan W. Black, Florian Metze
Multimodal Speech Summarization Through Semantic Concept Learning.

Interspeech2021 Brian Yan, Siddharth Dalmia, David R. Mortensen, Florian Metze, Shinji Watanabe 0001, 
Differentiable Allophone Graphs for Language-Universal Speech Recognition.

TASLP2020 Odette Scharenborg, Lucas Ondel, Shruti Palaskar, Philip Arthur, Francesco Ciannella, Mingxing Du, Elin Larsen, Danny Merkx, Rachid Riad, Liming Wang, Emmanuel Dupoux, Laurent Besacier, Alan W. Black, Mark Hasegawa-Johnson, Florian Metze, Graham Neubig, Sebastian Stüker, Pierre Godard, Markus Müller 0001, 
Speech Technology for Unwritten Languages.

ICASSP2020 Xinjian Li, Siddharth Dalmia, Juncheng Li 0001, Matthew Lee 0012, Patrick Littell, Jiali Yao, Antonios Anastasopoulos, David R. Mortensen, Graham Neubig, Alan W. Black, Florian Metze
Universal Phone Recognition with a Multilingual Allophone System.

ICASSP2020 Anirudh Mani, Shruti Palaskar, Nimshi Venkat Meripo, Sandeep Konam, Florian Metze
ASR Error Correction and Domain Adaptation Using Machine Translation.

ICASSP2020 Tejas Srinivasan, Ramon Sanabria, Florian Metze
Looking Enhances Listening: Recovering Missing Speech Using Images.

Interspeech2020 Mahaveer Jain, Gil Keren, Jay Mahadeokar, Geoffrey Zweig, Florian Metze, Yatharth Saraf, 
Contextual RNN-T for Open Domain ASR.

Interspeech2020 Zimeng Qiu, Yiyuan Li, Xinjian Li, Florian Metze, William M. Campbell, 
Towards Context-Aware End-to-End Code-Switching Speech Recognition.

SpeechComm2019 Okko Räsänen, Shreyas Seshadri, Julien Karadayi, Eric Riebling, John Bunce, Alejandrina Cristià, Florian Metze, Marisa Casillas, Celia Rosemberg, Elika Bergelson, Melanie Soderstrom, 
Automatic word count estimation from daylong child-centered recordings in various language environments using language-independent syllabification of speech.

ICASSP2019 Ozan Caglayan, Ramon Sanabria, Shruti Palaskar, Loïc Barrault, Florian Metze
Multimodal Grounding for Sequence-to-sequence Speech Recognition.

ICASSP2019 Siddharth Dalmia, Xinjian Li, Alan W. Black, Florian Metze
Phoneme Level Language Models for Sequence Based Low Resource ASR.

Interspeech2019 Suyoun Kim, Siddharth Dalmia, Florian Metze
Cross-Attention End-to-End ASR for Two-Party Conversations.

Interspeech2019 Xinjian Li, Siddharth Dalmia, Alan W. Black, Florian Metze
Multilingual Speech Recognition with Corpus Relatedness Sampling.

Interspeech2019 Xinjian Li, Zhong Zhou, Siddharth Dalmia, Alan W. Black, Florian Metze
SANTLR: Speech Annotation Toolkit for Low Resource Languages.

#62  | Carlos Busso | Google Scholar   DBLP
VenuesInterspeech: 21ICASSP: 11TASLP: 6SpeechComm: 3
Years2022: 52021: 32020: 52019: 52018: 82017: 52016: 10
ISCA Sectionspeech emotion recognition: 3emotion modeling: 2(multimodal) speech emotion recognition: 1emotion and sentiment analysis: 1voice activity detection: 1computational paralinguistics: 1emotion modeling and analysis: 1speaker state and trait: 1emotion recognition and analysis: 1syllabification, rhythm, and voice activity detection: 1multimodal paralinguistics: 1emotion recognition: 1source separation and voice activity detection: 1automatic assessment of emotions: 1speaker states and traits: 1special session: 1topics in speech processing: 1multimodal processing: 1
IEEE Keywordemotion recognition: 14speech recognition: 11speech emotion recognition: 5regression analysis: 4pattern classification: 3natural language processing: 2audio visual systems: 2inter evaluator agreement: 2information retrieval: 2preference learning: 2image classification: 2support vector machines: 2distribution label learning: 1multi label learning: 1soft label learning: 1neural net architecture: 1auxiliary networks: 1shared losses: 1multimodal fusion: 1transformers: 1audiovisual emotion recognition: 1feature selection: 1acoustic noise: 1acoustic feature: 1noisy speech: 1signal representation: 1disentangled representation learning: 1guided representation learning: 1audio generation: 1audio signal processing: 1and generative adversarial neural network: 1semi supervised emotion recognition: 1ladder networks: 1signal denoising: 1monte carlo dropout: 1monte carlo methods: 1activation functions: 1reject option.: 1curriculum learning: 1triplet loss: 1emotion retrieval: 1perception: 1ranking: 1speech synthesis: 1unlabeled adaptation of acoustic emotional models: 1adversarial training: 1image representation: 1gradient methods: 1gaussian processes: 1vocabulary: 1multimodal deep learning: 1hidden markov models: 1audiovisual large vocabulary automatic speech recognition: 1iterative methods: 1svm adaptation: 1active learning: 1speaker verification: 1speaker recognition: 1time continuous emotional descriptors: 1relative emotional labels: 1rank based emotion recognition: 1reliability: 1crowdsourcing: 1video signal processing: 1data compression: 1speech summarization: 1information resources: 1summary composition: 1convergence: 1data mining: 1entrainment: 1human human interaction: 1synchrony: 1multimodal: 1text analysis: 1rank svm: 1signal classification: 1
Most Publications2018: 192017: 192016: 172022: 162021: 16


ICASSP2022 Huang-Cheng Chou, Wei-Cheng Lin, Chi-Chun Lee, Carlos Busso
Exploiting Annotators' Typed Description of Emotion Perception to Maximize Utilization of Ratings for Speech Emotion Recognition.

ICASSP2022 Lucas Goncalves, Carlos Busso
AuxFormer: Robust Approach to Audiovisual Emotion Recognition.

ICASSP2022 Seong-Gyun Leem, Daniel Fulford, Jukka-Pekka Onnela, David Gard, Carlos Busso
Not All Features are Equal: Selection of Robust Features for Speech Emotion Recognition in Noisy Environments.

Interspeech2022 Huang-Cheng Chou, Chi-Chun Lee, Carlos Busso
Exploiting Co-occurrence Frequency of Emotions in Perceptual Evaluations To Train A Speech Emotion Classifier.

Interspeech2022 Lucas Goncalves, Carlos Busso
Improving Speech Emotion Recognition Using Self-Supervised Learning with Domain-Specific Audiovisual Tasks.

TASLP2021 Kazi Nazmul Haque, Rajib Rana, Jiajun Liu, John H. L. Hansen, Nicholas Cummins, Carlos Busso, Björn W. Schuller, 
Guided Generative Adversarial Neural Network for Representation Learning and Audio Generation Using Fewer Labelled Audio Data.

Interspeech2021 Seong-Gyun Leem, Daniel Fulford, Jukka-Pekka Onnela, David Gard, Carlos Busso
Separation of Emotional and Reconstruction Embeddings on Ladder Network to Improve Speech Emotion Recognition Robustness in Noisy Conditions.

Interspeech2021 Jarrod Luckenbaugh, Samuel Abplanalp, Rachel Gonzalez, Daniel Fulford, David Gard, Carlos Busso
Voice Activity Detection with Teacher-Student Domain Emulation.

TASLP2020 Srinivas Parthasarathy, Carlos Busso
Semi-Supervised Speech Emotion Recognition With Ladder Networks.

ICASSP2020 Kusha Sridhar, Carlos Busso
Modeling Uncertainty in Predicting Emotional Attributes from Spontaneous Speech.

Interspeech2020 Wei-Cheng Lin, Carlos Busso
An Efficient Temporal Modeling Approach for Speech Emotion Recognition by Mapping Varied Duration Sentences into Fixed Number of Chunks.

Interspeech2020 Luz Martinez-Lucas, Mohammed Abdelwahab 0001, Carlos Busso
The MSP-Conversation Corpus.

Interspeech2020 Kusha Sridhar, Carlos Busso
Ensemble of Students Taught by Probabilistic Teachers to Improve Speech Emotion Recognition.

SpeechComm2019 Najmeh Sadoughi, Carlos Busso
Speech-driven animation with meaningful behaviors.

SpeechComm2019 Fei Tao, Carlos Busso
End-to-end audiovisual speech activity detection with bimodal recurrent neural models.

TASLP2019 Reza Lotfian, Carlos Busso
Curriculum Learning for Speech Emotion Recognition From Crowdsourced Labels.

ICASSP2019 John B. Harvill, Mohammed Abdel-Wahab 0001, Reza Lotfian, Carlos Busso
Retrieving Speech Samples with Similar Emotional Content Using a Triplet Loss Function.

Interspeech2019 Kusha Sridhar, Carlos Busso
Speech Emotion Recognition with a Reject Option.

SpeechComm2018 Najmeh Sadoughi, Yang Liu, Carlos Busso
Meaningful head movements driven by emotional synthetic speech.

TASLP2018 Mohammed Abdel-Wahab 0001, Carlos Busso
Domain Adversarial for Acoustic Emotion Recognition.

#63  | Jesús Villalba 0001 | Google Scholar   DBLP
VenuesInterspeech: 28ICASSP: 11SpeechComm: 1TASLP: 1
Years2022: 62021: 102020: 82019: 62018: 72017: 22016: 2
ISCA Sectiontrustworthy speech processing: 3robust speaker recognition: 2the attacker’s perpective on automatic speaker verification: 2speaker verification: 2speaker state and trait: 2self supervision and anti-spoofing: 1speaker recognition and diarization: 1voice activity detection and keyword spotting: 1non-autoregressive sequential modeling for speech processing: 1the adresso challenge: 1embedding and network architecture for speaker recognition: 1voice anti-spoofing and countermeasure: 1the zero resource speech challenge 2020: 1speaker embedding: 1speaker recognition and anti-spoofing: 1the 2019 automatic speaker verification spoofing and countermeasures challenge: 1the voices from a distance challenge: 1speaker recognition evaluation: 1language identification: 1the first dihard speech diarization challenge: 1speaker recognition: 1speaker and language recognition applications: 1
IEEE Keywordspeaker recognition: 9speech recognition: 4speaker verification: 4emotion recognition: 3supervised learning: 2perceptual loss: 2natural language processing: 2signal denoising: 2transfer learning: 2feature enhancement: 2speech enhancement: 2understanding: 1object detection: 1connectionist temporal classification: 1regularization: 1attention: 1automatic speech recognition: 1speech synthesis: 1unsupervised learning: 1text to speech: 1multi task learning: 1self supervised features: 1speech denoising: 1deep learning (artificial intelligence): 1audio signal processing: 1signal classification: 1pre trained networks: 1copypaste: 1data augmentation: 1x vector: 1deep feature loss: 1medical signal processing: 1speech: 1diseases: 1parkinson’s disease: 1medical disorders: 1i vectors: 1x vectors: 1patient diagnosis: 1neurophysiology: 1linear discriminant analysis: 1data handling: 1dereverberation: 1far field adaptation: 1cyclegan: 1channel bank filters: 1x vector: 1pre trained: 1cold fusion: 1storage management: 1deep fusion: 1sequence to sequence: 1shallow fusion: 1language model: 1automatic speech recognition (asr): 1generative adversarial neural networks (gans): 1unsupervised domain adaptation: 1cycle gans: 1microphones: 1regression analysis: 1mean square error methods: 1gaussian processes: 1lstm: 1rnn: 1uncertainty estimation: 1age issues: 1age estimation: 1reliability: 1robustness: 1bayesian networks: 1quality measures: 1
Most Publications2020: 212021: 202019: 192022: 172018: 13

Affiliations
Johns Hopkins University, Center for Language and Speech Processing, Baltimore, MD, USA
University of Zaragoza, Spain

Interspeech2022 Jaejin Cho, Raghavendra Pappagari, Piotr Zelasko, Laureano Moro-Velázquez, Jesús Villalba 0001, Najim Dehak, 
Non-contrastive self-supervised learning of utterance-level speech representations.

Interspeech2022 Sonal Joshi, Saurabh Kataria, Yiwen Shao, Piotr Zelasko, Jesús Villalba 0001, Sanjeev Khudanpur, Najim Dehak, 
Defense against Adversarial Attacks on Hybrid Speech Recognition System using Adversarial Fine-tuning with Denoiser.

Interspeech2022 Sonal Joshi, Saurabh Kataria, Jesús Villalba 0001, Najim Dehak, 
AdvEst: Adversarial Perturbation Estimation to Classify and Detect Adversarial Attacks against Speaker Identification.

Interspeech2022 Saurabh Kataria, Jesús Villalba 0001, Laureano Moro-Velázquez, Najim Dehak, 
Joint domain adaptation and speech bandwidth extension using time-domain GANs for speaker verification.

Interspeech2022 Magdalena Rybicka, Jesús Villalba 0001, Najim Dehak, Konrad Kowalczyk, 
End-to-End Neural Speaker Diarization with an Iterative Refinement of Non-Autoregressive Attention-based Attractors.

Interspeech2022 Yiwen Shao, Jesús Villalba 0001, Sonal Joshi, Saurabh Kataria, Sanjeev Khudanpur, Najim Dehak, 
Chunking Defense for Adversarial Attacks on ASR.

ICASSP2021 Nanxin Chen, Piotr Zelasko, Jesús Villalba 0001, Najim Dehak, 
Focus on the Present: A Regularization Method for the ASR Source-Target Attention Layer.

ICASSP2021 Jaejin Cho, Piotr Zelasko, Jesús Villalba 0001, Najim Dehak, 
Improving Reconstruction Loss Based Speaker Embedding in Unsupervised and Semi-Supervised Scenarios.

ICASSP2021 Saurabh Kataria, Jesús Villalba 0001, Najim Dehak, 
Perceptual Loss Based Speech Denoising with an Ensemble of Audio Pattern Recognition and Self-Supervised Models.

ICASSP2021 Raghavendra Pappagari, Jesús Villalba 0001, Piotr Zelasko, Laureano Moro-Velázquez, Najim Dehak, 
CopyPaste: An Augmentation Method for Speech Emotion Recognition.

Interspeech2021 Saurabhchand Bhati, Jesús Villalba 0001, Piotr Zelasko, Laureano Moro-Velázquez, Najim Dehak, 
Segmental Contrastive Predictive Coding for Unsupervised Word Segmentation.

Interspeech2021 Nanxin Chen, Piotr Zelasko, Laureano Moro-Velázquez, Jesús Villalba 0001, Najim Dehak, 
Align-Denoise: Single-Pass Non-Autoregressive Speech Recognition.

Interspeech2021 Saurabh Kataria, Jesús Villalba 0001, Piotr Zelasko, Laureano Moro-Velázquez, Najim Dehak, 
Deep Feature CycleGANs: Speaker Identity Preserving Non-Parallel Microphone-Telephone Domain Adaptation for Speaker Verification.

Interspeech2021 Raghavendra Pappagari, Jaejin Cho, Sonal Joshi, Laureano Moro-Velázquez, Piotr Zelasko, Jesús Villalba 0001, Najim Dehak, 
Automatic Detection and Assessment of Alzheimer Disease Using Speech and Language Technologies in Low-Resource Scenarios.

Interspeech2021 Magdalena Rybicka, Jesús Villalba 0001, Piotr Zelasko, Najim Dehak, Konrad Kowalczyk, 
Spine2Net: SpineNet with Res2Net and Time-Squeeze-and-Excitation Blocks for Speaker Recognition.

Interspeech2021 Jesús Villalba 0001, Sonal Joshi, Piotr Zelasko, Najim Dehak, 
Representation Learning to Classify and Detect Adversarial Attacks Against Speaker and Speech Recognition Systems.

ICASSP2020 Saurabh Kataria, Phani Sankar Nidadavolu, Jesús Villalba 0001, Nanxin Chen, L. Paola García-Perera, Najim Dehak, 
Feature Enhancement with Deep Feature Losses for Speaker Verification.

ICASSP2020 Laureano Moro-Velázquez, Jesús Villalba 0001, Najim Dehak, 
Using X-Vectors to Automatically Detect Parkinson's Disease from Speech.

ICASSP2020 Phani Sankar Nidadavolu, Saurabh Kataria, Jesús Villalba 0001, L. Paola García-Perera, Najim Dehak, 
Unsupervised Feature Enhancement for Speaker Verification.

ICASSP2020 Raghavendra Pappagari, Tianzi Wang, Jesús Villalba 0001, Nanxin Chen, Najim Dehak, 
X-Vectors Meet Emotions: A Study On Dependencies Between Emotion and Speaker Recognition.

#64  | Zheng-Hua Tan | Google Scholar   DBLP
VenuesInterspeech: 15TASLP: 11ICASSP: 10SpeechComm: 4
Years2023: 12022: 42021: 32020: 62019: 42018: 52017: 92016: 8
ISCA Sectionspecial session: 2robust speaker recognition and anti-spoofing: 2speech segmentation: 1speech synthesis: 1speech recognition and beyond: 1language identification: 1speech intelligibility: 1speaker recognition evaluation: 1speech-enhancement: 1speaker database and anti-spoofing: 1short utterances speaker recognition: 1speech and audio segmentation and classification: 1speaker recognition: 1
IEEE Keywordspeech enhancement: 10speech intelligibility: 6array signal processing: 4speaker recognition: 4microphone arrays: 3speech recognition: 3audio visual systems: 3audio signal processing: 3maximum likelihood estimation: 3deep neural networks: 3hearing aids: 3maximum likelihood: 2computational complexity: 2speech separation: 2sensor fusion: 2filtering theory: 2security of data: 2least mean squares methods: 2speaker verification: 2enhanced speech: 2speech in noise: 2binaural speech intelligibility prediction: 2hearing: 1turn taking: 1beamforming: 1speech behavior: 1signal denoising: 1asii: 1approximation theory: 1multi microphone: 1beamformer: 1speech intelligibility enhancement: 1multi speaker asr: 1alimeeting: 1speaker diarization: 1m2met: 1natural language processing: 1meeting transcription: 1keyword spotting: 1keyword embedding: 1deep metric learning: 1text analysis: 1multi condition training: 1noise robustness: 1loss function: 1audio visual processing: 1speech synthesis: 1sound source separation: 1source separation: 1multi task learning: 1face landmarks: 1audio visual: 1deep learning (artificial intelligence): 1speech inpainting: 1supervised learning: 1mean square error methods: 1fully convolutional neural networks: 1time domain: 1objective intelligibility: 1gradient methods: 1multichannel speech enhancement: 1probability: 1kalman filter: 1recursive expectation maximization: 1speech presence probability: 1expectation maximisation algorithm: 1own voice retrieval: 1multi microphone speech enhancement: 1spectral analysis: 1power spectral density estimation: 1signal classification: 1convolutional neural nets: 1adversarial attack: 1convo lutional neural network: 1web sites: 1cepstral feature: 1minimum mean square error estimator: 1correlation theory: 1gaussian processes: 1brain: 1dnns: 1cepstral analysis: 1bottleneck feature: 1image segmentation: 1gmm ubm: 1pattern clustering: 1time contrastive learning: 1objective functions: 1training targets: 1audio visual speech enhancement: 1convolution: 1nonintrusive speech intelligibility prediction: 1convolutional neural networks: 1transfer functions: 1relative transfer function: 1hearing aid: 1direction of arrival estimation: 1reverberation: 1sound source localization: 1ideal ratio mask: 1generalizability: 1intelligibility: 1non intrusive speech intelligibility prediction: 1replay: 1spoofing: 1cocktail party problem: 1permutation invariant training: 1cnn: 1dnn: 1time frequency analysis: 1binaural advantage: 1speech transmission: 1microphones: 1interference (signal): 1
Most Publications2020: 272018: 272016: 272021: 252017: 23

Affiliations
URLs

SpeechComm2023 Iván López-Espejo, Amin Edraki, Wai-Yip Chan, Zheng-Hua Tan, Jesper Jensen 0001, 
On the deficiency of intelligibility metrics as proxies for subjective intelligibility.

TASLP2022 Poul Hoang, Jan Mark de Haan, Zheng-Hua Tan, Jesper Jensen 0001, 
Multichannel Speech Enhancement With Own Voice-Based Interfering Speech Suppression for Hearing Assistive Devices.

ICASSP2022 Andreas Jonas Fuglsig, Jan Østergaard, Jesper Jensen 0001, Lars Søndergaard Bertelsen, Peter Mariager, Zheng-Hua Tan
Joint Far- and Near-End Speech Intelligibility Enhancement Based on the Approximated Speech Intelligibility Index.

ICASSP2022 Fan Yu, Shiliang Zhang, Pengcheng Guo, Yihui Fu, Zhihao Du, Siqi Zheng, Weilong Huang, Lei Xie 0001, Zheng-Hua Tan, DeLiang Wang, Yanmin Qian, Kong Aik Lee, Zhijie Yan, Bin Ma 0001, Xin Xu, Hui Bu, 
Summary on the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge.

Interspeech2022 Claus M. Larsen, Peter Koch 0001, Zheng-Hua Tan
Adversarial Multi-Task Deep Learning for Noise-Robust Voice Activity Detection with Low Algorithmic Delay.

TASLP2021 Iván López-Espejo, Zheng-Hua Tan, Jesper Jensen 0001, 
A Novel Loss Function and Training Strategy for Noise-Robust Keyword Spotting.

TASLP2021 Daniel Michelsanti, Zheng-Hua Tan, Shi-Xiong Zhang, Yong Xu 0004, Meng Yu 0003, Dong Yu 0001, Jesper Jensen 0001, 
An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation.

ICASSP2021 Giovanni Morrone, Daniel Michelsanti, Zheng-Hua Tan, Jesper Jensen 0001, 
Audio-Visual Speech Inpainting with Deep Learning.

SpeechComm2020 Daniel Michelsanti, Zheng-Hua Tan, Sigurdur Sigurdsson, Jesper Jensen 0001, 
Deep-learning-based audio-visual speech enhancement in presence of Lombard effect.

TASLP2020 Morten Kolbæk, Zheng-Hua Tan, Søren Holdt Jensen, Jesper Jensen 0001, 
On Loss Functions for Supervised Monaural Time-Domain Speech Enhancement.

TASLP2020 Juan M. Martín-Doñas, Jesper Jensen 0001, Zheng-Hua Tan, Angel M. Gomez, Antonio M. Peinado, 
Online Multichannel Speech Enhancement Based on Recursive EM and DNN-Based Speech Presence Estimation.

ICASSP2020 Poul Hoang, Zheng-Hua Tan, Thomas Lunner, Jan Mark de Haan, Jesper Jensen 0001, 
Maximum Likelihood Estimation of the Interference-Plus-Noise Cross Power Spectral Density Matrix for Own Voice Retrieval.

ICASSP2020 Saeid Samizade, Zheng-Hua Tan, Chao Shen 0001, Xiaohong Guan, 
Adversarial Example Detection by Classification for Deep Speech Recognition.

Interspeech2020 Daniel Michelsanti, Olga Slizovskaia, Gloria Haro, Emilia Gómez, Zheng-Hua Tan, Jesper Jensen 0001, 
Vocoder-Based Speech Synthesis from Silent Videos.

TASLP2019 Morten Kolbaek, Zheng-Hua Tan, Jesper Jensen 0001, 
On the Relationship Between Short-Time Objective Intelligibility and Short-Time Spectral-Amplitude Mean-Square Error for Speech Enhancement.

TASLP2019 Achintya Kumar Sarkar, Zheng-Hua Tan, Hao Tang 0002, Suwon Shon, James R. Glass, 
Time-Contrastive Learning Based Deep Bottleneck Features for Text-Dependent Speaker Verification.

ICASSP2019 Daniel Michelsanti, Zheng-Hua Tan, Sigurdur Sigurdsson, Jesper Jensen 0001, 
On Training Targets and Objective Functions for Deep-learning-based Audio-visual Speech Enhancement.

Interspeech2019 Iván López-Espejo, Zheng-Hua Tan, Jesper Jensen 0001, 
Keyword Spotting for Hearing Assistive Devices Robust to External Speakers.

SpeechComm2018 Renhua Peng, Zheng-Hua Tan, Xiaodong Li 0002, Chengshi Zheng, 
A perceptually motivated LP residual estimator in noisy and reverberant environments.

SpeechComm2018 Asger Heidemann Andersen, Jan Mark de Haan, Zheng-Hua Tan, Jesper Jensen 0001, 
Refinement and validation of the binaural short time objective intelligibility measure for spatially diverse conditions.

#65  | Bin Ma 0001 | Google Scholar   DBLP
VenuesInterspeech: 20ICASSP: 16SpeechComm: 2EMNLP: 1TASLP: 1
Years2022: 32021: 32020: 52019: 52018: 12017: 92016: 14
ISCA Sectionspoken term detection: 4special session: 4asr neural network architectures: 2cross-lingual and multilingual asr: 2acoustic model adaptation for asr: 1speech synthesis: 1far-field speech recognition: 1neural techniques for voice conversion and waveform generation: 1speaker recognition evaluation: 1resources and annotation of resources: 1language recognition: 1feature extraction and acoustic modeling using neural networks for asr: 1
IEEE Keywordspeech recognition: 10natural language processing: 6speaker recognition: 4recurrent neural nets: 3speech enhancement: 3text analysis: 3keyword spotting: 3spoken term detection: 3time frequency analysis: 2convolutional neural nets: 2probability: 2estimation theory: 2language translation: 1lightweight adaptation: 1prefix tuning: 1speech to text translation: 1multi speaker asr: 1alimeeting: 1speaker diarization: 1m2met: 1meeting transcription: 1microphone arrays: 1frequency recurrence: 1speech coding: 1feature representation: 1frequency domain analysis: 1complex network: 1deep learning (artificial intelligence): 1attention mechanism: 1lpcnet: 1gaussian processes: 1voice conversion: 1cross lingual: 1phonetic posteriorgrams: 1speech synthesis: 1vocoders: 1fastspeech: 1independent language model: 1low resource asr: 1pre training: 1catastrophic forgetting.: 1fine tuning: 1audio visual speech recognition: 1audio visual systems: 1multi condition training: 1robust speech recognition: 1dropout: 1bimodal df smn: 1laplacian eigenmaps: 1laplacian probabilistic latent semantic analysis: 1graph regularization: 1matrix algebra: 1graph theory: 1topic modeling: 1data structures: 1topic segmentation: 1data reduction: 1channel adaptation: 1channel prior estimation: 1probabilistic linear discriminant analysis: 1multi source speaker verification: 1recurrent neural network: 1multilingual data selection: 1language identification: 1a priori snr: 1pattern matching: 1pairwise learning: 1autoencoder: 1low resource speech processing: 1bottleneck features: 1short duration utterance: 1speaker verification: 1content aware local variability: 1deep neural network (dnn): 1large vocabulary continuous speech recognition (lvcsr): 1under resourced languages: 1spoken term detection (std): 1automatic speech recognition (asr): 1computational linguistics: 1submodular optimization: 1active learning: 1query processing: 1data augmentation: 1time series: 1dtw: 1audio signal processing: 1partial matching: 1query by example: 1reverberation: 1multi task learning: 1i vector: 1speaker adaptation: 1deep neural network: 1noise robustness: 1
Most Publications2010: 282015: 212014: 212016: 192017: 18

Affiliations
Alibaba Group, Speech Lab, Singapore
Nanyang Technological University, School of Computer Science and Engineering, Singapore
Institute for Infocomm Research, A*STAR, Singapore (since 2004)
University of Hong Kong, Hong Kong (PhD 2000)

ICASSP2022 Yukun Ma, Trung Hieu Nguyen 0001, Bin Ma 0001
CPT: Cross-Modal Prefix-Tuning for Speech-To-Text Translation.

ICASSP2022 Fan Yu, Shiliang Zhang, Pengcheng Guo, Yihui Fu, Zhihao Du, Siqi Zheng, Weilong Huang, Lei Xie 0001, Zheng-Hua Tan, DeLiang Wang, Yanmin Qian, Kong Aik Lee, Zhijie Yan, Bin Ma 0001, Xin Xu, Hui Bu, 
Summary on the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge.

ICASSP2022 Shengkui Zhao, Bin Ma 0001, Karn N. Watcharasupat, Woon-Seng Gan, 
FRCRN: Boosting Feature Representation Using Frequency Recurrence for Monaural Speech Enhancement.

ICASSP2021 Shengkui Zhao, Trung Hieu Nguyen 0001, Bin Ma 0001
Monaural Speech Enhancement with Complex Convolutional Block Attention Module and Joint Time Frequency Losses.

ICASSP2021 Shengkui Zhao, Hao Wang, Trung Hieu Nguyen 0001, Bin Ma 0001
Towards Natural and Controllable Cross-Lingual Voice Conversion Based on Neural TTS Model and Phonetic Posteriorgram.

EMNLP2021 Yingzhu Zhao, Chongjia Ni, Cheung-Chi Leung, Shafiq R. Joty, Eng Siong Chng, Bin Ma 0001
A Unified Speaker Adaptation Approach for ASR.

ICASSP2020 Van Tung Pham, Haihua Xu, Yerbolat Khassanov, Zhiping Zeng, Eng Siong Chng, Chongjia Ni, Bin Ma 0001, Haizhou Li 0001, 
Independent Language Modeling Architecture for End-To-End ASR.

Interspeech2020 Yingzhu Zhao, Chongjia Ni, Cheung-Chi Leung, Shafiq R. Joty, Eng Siong Chng, Bin Ma 0001
Speech Transformer with Speaker Aware Persistent Memory.

Interspeech2020 Yingzhu Zhao, Chongjia Ni, Cheung-Chi Leung, Shafiq R. Joty, Eng Siong Chng, Bin Ma 0001
Universal Speech Transformer.

Interspeech2020 Yingzhu Zhao, Chongjia Ni, Cheung-Chi Leung, Shafiq R. Joty, Eng Siong Chng, Bin Ma 0001
Cross Attention with Monotonic Alignment for Speech Transformer.

Interspeech2020 Shengkui Zhao, Trung Hieu Nguyen 0001, Hao Wang, Bin Ma 0001
Towards Natural Bilingual and Code-Switched Speech Synthesis Based on Mix of Monolingual Recordings and Cross-Lingual Voice Conversion.

ICASSP2019 Shiliang Zhang, Ming Lei, Bin Ma 0001, Lei Xie 0001, 
Robust Audio-visual Speech Recognition Using Bimodal Dfsmn with Multi-condition Training and Dropout Regularization.

Interspeech2019 Yerbolat Khassanov, Haihua Xu, Van Tung Pham, Zhiping Zeng, Eng Siong Chng, Chongjia Ni, Bin Ma 0001
Constrained Output Embeddings for End-to-End Code-Switching Speech Recognition with Only Monolingual Data.

Interspeech2019 Shiliang Zhang, Yuan Liu, Ming Lei, Bin Ma 0001, Lei Xie 0001, 
Towards Language-Universal Mandarin-English Speech Recognition.

Interspeech2019 Shengkui Zhao, Chongjia Ni, Rong Tong, Bin Ma 0001
Multi-Task Multi-Network Joint-Learning of Deep Residual Networks and Cycle-Consistency Generative Adversarial Networks for Robust Speech Recognition.

Interspeech2019 Shengkui Zhao, Trung Hieu Nguyen 0001, Hao Wang, Bin Ma 0001
Fast Learning for Non-Parallel Many-to-Many Voice Conversion with Residual Star Generative Adversarial Networks.

Interspeech2018 Yougen Yuan, Cheung-Chi Leung, Lei Xie 0001, Hongjie Chen, Bin Ma 0001, Haizhou Li 0001, 
Learning Acoustic Word Embeddings with Temporal Context for Query-by-Example Speech Search.

SpeechComm2017 Chang Huai You, Bin Ma 0001
Spectral-domain speech enhancement for speech recognition.

TASLP2017 Hongjie Chen, Lei Xie 0001, Cheung-Chi Leung, Xiaoming Lu, Bin Ma 0001, Haizhou Li 0001, 
Modeling Latent Topics and Temporal Distance for Story Segmentation of Broadcast News.

ICASSP2017 Liping Chen, Kong-Aik Lee, Bin Ma 0001, Long Ma, Haizhou Li 0001, Li-Rong Dai 0001, 
Adaptation of PLDA for multi-source text-independent speaker verification.

#66  | Xixin Wu | Google Scholar   DBLP
VenuesInterspeech: 22ICASSP: 15TASLP: 3
Years2022: 92021: 72020: 82019: 112018: 5
ISCA Sectionspeech synthesis: 5speech and language in health: 1spoofing-aware automatic speaker verification (sasv): 1voice anti-spoofing and countermeasure: 1non-autoregressive sequential modeling for speech processing: 1speech recognition of atypical speech: 1automatic speech recognition for non-native children’s speech: 1speaker recognition: 1spoken language evaluatiosn: 1learning techniques for speaker recognition: 1asr neural network architectures: 1neural techniques for voice conversion and waveform generation: 1speech and audio classification: 1lexicon and language model for speech recognition: 1second language acquisition and code-switching: 1voice conversion: 1expressive speech synthesis: 1application of asr in medical practice: 1
IEEE Keywordspeech recognition: 8speaker recognition: 7speech synthesis: 7emotion recognition: 5speech emotion recognition: 5recurrent neural nets: 5speech coding: 4voice conversion: 3deep learning (artificial intelligence): 2dysarthric speech reconstruction: 2optimisation: 2natural language processing: 2speech intelligibility: 2code switching: 2gaussian processes: 2entropy: 2convolutional neural nets: 2unsupervised learning: 1audio signal processing: 1multitask learning: 1speaker change detection: 1unsupervised speech decomposition: 1speaker identity: 1adversarial speaker adaptation: 1uniform sampling: 1path dropout: 1neural architecture search: 1multi channel: 1data handling: 1speaker diarization: 1m2met: 1voice activity detection: 1overlapped speech: 1feature fusion: 1signal sampling: 1location relative attention: 1signal representation: 1signal reconstruction: 1sequence to sequence modeling: 1any to many: 1residual error: 1capsule: 1exemplary emotion descriptor: 1expressive speech synthesis: 1spatial information: 1recurrent: 1capsule network: 1sequential: 1phonetic pos teriorgrams: 1x vector: 1gmm i vector: 1speaker verification: 1adversarial attack: 1accented speech recognition: 1accent conversion: 1cross modal: 1seq2seq: 1knowledge distillation: 1end to end: 1multilingual speech synthesis: 1foreign accent: 1human computer interaction: 1center loss: 1spectral analysis: 1discriminative features: 1activation function selection: 1bayes methods: 1gaussian process neural network: 1inference mechanisms: 1bayesian neural network: 1quasifully recurrent neural network (qrnn): 1variational inference: 1text to speech (tts) synthesis: 1parallel wavenet: 1convolutional neural network (cnn): 1parallel processing: 1capsule networks: 1spatial relationship information: 1recurrent connection: 1utterance level features: 1rnnlms: 1natural gradient: 1gradient methods: 1style adaptation: 1regression analysis: 1expressiveness: 1speaking style: 1style feature: 1
Most Publications2022: 282020: 142019: 142021: 132023: 8

Affiliations
URLs

ICASSP2022 Hang Su, Danyang Zhao, Long Dang, Minglei Li, Xixin Wu, Xunying Liu, Helen Meng, 
A Multitask Learning Framework for Speaker Change Detection with Content Information from Unsupervised Speech Decomposition.

ICASSP2022 Disong Wang, Songxiang Liu, Xixin Wu, Hui Lu, Lifa Sun, Xunying Liu, Helen Meng, 
Speaker Identity Preservation in Dysarthric Speech Reconstruction by Adversarial Speaker Adaptation.

ICASSP2022 Xixin Wu, Shoukang Hu, Zhiyong Wu 0001, Xunying Liu, Helen Meng, 
Neural Architecture Search for Speech Emotion Recognition.

ICASSP2022 Naijun Zheng, Na Li 0012, Xixin Wu, Lingwei Meng, Jiawen Kang, Haibin Wu, Chao Weng, Dan Su 0002, Helen Meng, 
The CUHK-Tencent Speaker Diarization System for the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Challenge.

Interspeech2022 Jie Chen, Changhe Song, Deyi Tuo, Xixin Wu, Shiyin Kang, Zhiyong Wu 0001, Helen Meng, 
Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information.

Interspeech2022 Haohan Guo, Hui Lu, Xixin Wu, Helen Meng, 
A Multi-Scale Time-Frequency Spectrogram Discriminator for GAN-based Non-Autoregressive TTS.

Interspeech2022 Haohan Guo, Feng-Long Xie, Frank K. Soong, Xixin Wu, Helen Meng, 
A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS.

Interspeech2022 Yi Wang, Tianzi Wang, Zi Ye, Lingwei Meng, Shoukang Hu, Xixin Wu, Xunying Liu, Helen Meng, 
Exploring linguistic feature and model combination for speech recognition based automatic AD detection.

Interspeech2022 Haibin Wu, Lingwei Meng, Jiawen Kang, Jinchao Li, Xu Li, Xixin Wu, Hung-yi Lee, Helen Meng, 
Spoofing-Aware Speaker Verification by Multi-Level Fusion.

TASLP2021 Songxiang Liu, Yuewen Cao, Disong Wang, Xixin Wu, Xunying Liu, Helen Meng, 
Any-to-Many Voice Conversion With Location-Relative Sequence-to-Sequence Modeling.

TASLP2021 Xixin Wu, Yuewen Cao, Hui Lu, Songxiang Liu, Shiyin Kang, Zhiyong Wu 0001, Xunying Liu, Helen Meng, 
Exemplar-Based Emotive Speech Synthesis.

TASLP2021 Xixin Wu, Yuewen Cao, Hui Lu, Songxiang Liu, Disong Wang, Zhiyong Wu 0001, Xunying Liu, Helen Meng, 
Speech Emotion Recognition Using Sequential Capsule Networks.

Interspeech2021 Qingyun Dou, Xixin Wu, Moquan Wan, Yiting Lu, Mark J. F. Gales, 
Deliberation-Based Multi-Pass Speech Synthesis.

Interspeech2021 Xu Li, Xixin Wu, Hui Lu, Xunying Liu, Helen Meng, 
Channel-Wise Gated Res2Net: Towards Robust Detection of Synthetic Speech Attacks.

Interspeech2021 Hui Lu, Zhiyong Wu 0001, Xixin Wu, Xu Li, Shiyin Kang, Xunying Liu, Helen Meng, 
VAENAR-TTS: Variational Auto-Encoder Based Non-AutoRegressive Text-to-Speech Synthesis.

Interspeech2021 Disong Wang, Songxiang Liu, Lifa Sun, Xixin Wu, Xunying Liu, Helen Meng, 
Learning Explicit Prosody Models and Deep Speaker Embeddings for Atypical Voice Conversion.

ICASSP2020 Yuewen Cao, Songxiang Liu, Xixin Wu, Shiyin Kang, Peng Liu, Zhiyong Wu 0001, Xunying Liu, Dan Su 0002, Dong Yu 0001, Helen Meng, 
Code-Switched Speech Synthesis Using Bilingual Phonetic Posteriorgram with Only Monolingual Corpora.

ICASSP2020 Xu Li, Jinghua Zhong, Xixin Wu, Jianwei Yu, Xunying Liu, Helen Meng, 
Adversarial Attacks on GMM I-Vector Based Speaker Verification Systems.

ICASSP2020 Songxiang Liu, Disong Wang, Yuewen Cao, Lifa Sun, Xixin Wu, Shiyin Kang, Zhiyong Wu 0001, Xunying Liu, Dan Su 0002, Dong Yu 0001, Helen Meng, 
End-To-End Accent Conversion Without Using Native Utterances.

ICASSP2020 Disong Wang, Jianwei Yu, Xixin Wu, Songxiang Liu, Lifa Sun, Xunying Liu, Helen Meng, 
End-To-End Voice Conversion Via Cross-Modal Knowledge Distillation for Dysarthric Speech Reconstruction.

#67  | Jing Xiao 0006 | Google Scholar   DBLP
VenuesInterspeech: 25ICASSP: 14ICML: 1
Years2022: 132021: 162020: 102019: 1
ISCA Sectionspeech synthesis: 8topics in asr: 2speech emotion recognition: 1source separation: 1novel models and training methods for asr: 1spoken language modeling and understanding: 1acoustic event detection and classification: 1non-autoregressive sequential modeling for speech processing: 1speech signal analysis and representation: 1graph and end-to-end learning for speaker recognition: 1embedding and network architecture for speaker recognition: 1acoustic event detection and acoustic scene classification: 1voice conversion and adaptation: 1spoken language understanding: 1dnn architectures for speaker recognition: 1speech and audio quality assessment: 1phonetic event detection and segmentation: 1
IEEE Keywordspeech synthesis: 8natural language processing: 5speaker recognition: 4speech recognition: 4voice conversion: 2zero shot: 2text analysis: 2transformer: 2text to speech: 2regression analysis: 1pattern classification: 1variance regularization: 1speaker age estimation: 1attribute inference: 1label distribution learning: 1vector quantization: 1contrastive learning: 1any to any: 1low resource: 1self supervised: 1object detection: 1query processing: 1patch embedding: 1visual dialog: 1multi modal: 1computer vision: 1pattern clustering: 1question answering (information retrieval): 1interactive systems: 1self attention weight matrix: 1incomplete utterance rewriting: 1text edit: 1synthetic noise: 1adversarial perturbation: 1contextual information: 1grapheme to phoneme: 1multi speaker text to speech: 1conditional variational autoencoder: 1computational linguistics: 1continual learning: 1intent detection: 1slot filling: 1data acquisition: 1unsupervised: 1instance discriminator: 1information bottleneck: 1unsupervised learning: 1self attention: 1rnn transducer: 1recurrent neural nets: 1linear dependency analysis: 1network pruning: 1pqr: 1wireless channels: 1feature maps: 1matrix algebra: 1convolutional codes: 1waveform generation: 1waveform generators: 1location variable convolution: 1vocoder: 1convolution: 1vocoders: 1autoregressive processes: 1non autoregressive: 1generative flow: 1graph theory: 1speech coding: 1prosody modelling: 1graph neural network: 1
Most Publications2022: 1202021: 1162020: 1062019: 352023: 18

Affiliations
PingAn Technology, Shenzhen, China
URLs

ICASSP2022 Shijing Si, Jianzong Wang, Junqing Peng, Jing Xiao 0006
Towards Speaker Age Estimation With Label Distribution Learning.

ICASSP2022 Huaizhen Tang, Xulong Zhang 0001, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006
Avqvc: One-Shot Voice Conversion By Vector Quantization With Applying Contrastive Learning.

ICASSP2022 Qiqi Wang 0005, Xulong Zhang 0001, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006
DRVC: A Framework of Any-to-Any Voice Conversion with Self-Supervised Learning.

ICASSP2022 Tong Ye, Shijing Si, Jianzong Wang, Rui Wang, Ning Cheng 0001, Jing Xiao 0006
VU-BERT: A Unified Framework for Visual Dialog.

ICASSP2022 Yong Zhang, Zhitao Li, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006
Self-Attention for Incomplete Utterance Rewriting.

ICASSP2022 Chendong Zhao, Jianzong Wang, Xiaoyang Qu, Haoqian Wang, Jing Xiao 0006
r-G2P: Evaluating and Enhancing Robustness of Grapheme to Phoneme Conversion by Controlled Noise Introducing and Contextual Information Incorporation.

ICASSP2022 Botao Zhao, Xulong Zhang 0001, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006
nnSpeech: Speaker-Guided Conditional Variational Autoencoder for Zero-Shot Multi-speaker text-to-speech.

Interspeech2022 Zuheng Kang, Junqing Peng, Jianzong Wang, Jing Xiao 0006
SpeechEQ: Speech Emotion Recognition based on Multi-scale Unified Datasets and Multitask Learning.

Interspeech2022 Jian Luo, Jianzong Wang, Ning Cheng 0001, Edward Xiao, Xulong Zhang 0001, Jing Xiao 0006
Tiny-Sepformer: A Tiny Time-Domain Transformer Network For Speech Separation.

Interspeech2022 Chenfeng Miao, Ting Chen, Minchuan Chen, Jun Ma 0018, Shaojun Wang, Jing Xiao 0006
A compact transformer-based GAN vocoder.

Interspeech2022 Chenfeng Miao, Kun Zou, Ziyang Zhuang, Tao Wei, Jun Ma 0018, Shaojun Wang, Jing Xiao 0006
Towards Efficiently Learning Monotonic Alignments for Attention-based End-to-End Speech Recognition.

Interspeech2022 Ye Wang, Baishun Ling, Yanmeng Wang, Junhao Xue, Shaojun Wang, Jing Xiao 0006
Adversarial Knowledge Distillation For Robust Spoken Language Understanding.

Interspeech2022 Tong Ye, Shijing Si, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006
Uncertainty Calibration for Deep Audio Classifiers.

ICASSP2021 Yanfei Hui, Jianzong Wang, Ning Cheng 0001, Fengying Yu, Tianbo Wu, Jing Xiao 0006
Joint Intent Detection and Slot Filling Based on Continual Learning Model.

ICASSP2021 Shuang Liang, Chenfeng Miao, Minchuan Chen, Jun Ma 0018, Shaojun Wang, Jing Xiao 0006
Unsupervised Learning for Multi-Style Speech Synthesis with Limited Data.

ICASSP2021 Jian Luo, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006
Unidirectional Memory-Self-Attention Transducer for Online Speech Recognition.

ICASSP2021 Hao Pan, Zhongdi Chao, Jiang Qian, Bojin Zhuang, Shaojun Wang, Jing Xiao 0006
Network Pruning Using Linear Dependency Analysis on Feature Maps.

ICASSP2021 Zhen Zeng, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006
LVCNet: Efficient Condition-Dependent Modeling Network for Waveform Generation.

Interspeech2021 Wei Chu, Peng Chang 0002, Jing Xiao 0006
Extending Pronunciation Dictionary with Automatically Detected Word Mispronunciations to Improve PAII's System for Interspeech 2021 Non-Native Child English Close Track ASR Challenge.

Interspeech2021 Ruchao Fan, Wei Chu, Peng Chang 0002, Jing Xiao 0006, Abeer Alwan, 
An Improved Single Step Non-Autoregressive Transformer for Automatic Speech Recognition.

#68  | Steve Renals | Google Scholar   DBLP
VenuesInterspeech: 23ICASSP: 11TASLP: 3SpeechComm: 2
Years2022: 22021: 82020: 92019: 72017: 42016: 9
ISCA Sectionfeature extraction and distant asr: 3robust speaker recognition: 1topics in asr: 1embedding and network architecture for speaker recognition: 1diverse modes of speech acquisition and processing: 1neural network training methods for asr: 1evaluation of speech technology systems and methods for resource construction and annotation: 1asr model training and strategies: 1asr neural network training: 1medical applications and visual asr: 1model training for asr: 1feature extraction for asr: 1spoken language processing for children’s speech: 1asr neural network architectures: 1multi-lingual models and adaptation for asr: 1spoken document processing: 1language recognition: 1acoustic model adaptation: 1language model adaptation: 1new trends in neural networks for speech recognition: 1neural networks in speech recognition: 1
IEEE Keywordspeech recognition: 11speaker recognition: 3regression analysis: 2acoustic modelling: 2recurrent neural nets: 2natural language processing: 2end to end: 2decoding: 2gaussian processes: 2unsupervised learning: 2deep neural networks: 2data augmentation: 1vicinal risk minimization: 1waveform based models: 1out of distribution generalization: 1raw phase spectrum: 1asr: 1multi head cnns: 1sensor fusion: 1phase based source filter separation: 1top down training: 1general classifier: 1layer wise training: 1language model: 1domain adaptation: 1multilingual speech recognition: 1adversarial learning: 1domain adversarial training: 1speaker verification: 1diarization: 1deep neural network: 1signal representation: 1convolutional neural nets: 1computer vision: 1signal resolution: 1low pass filters: 1robust speech recognition: 1transfer learning: 1attention: 1language translation: 1punctuation: 1encoding: 1neural machine translation: 1rich transcription: 1small footprint: 1highway deep neural networks: 1mixture models: 1knowledge distillation: 1feedforward neural nets: 1adaptation: 1differentiable pooling: 1speech coding: 1encoder decoder: 1recurrent neural networks: 1hidden markov models: 1end to end speech recognition: 1lhuc: 1sat: 1maximum likelihood estimation: 1
Most Publications2019: 242016: 212020: 202017: 192021: 17


TASLP2022 Dino Oglic, Zoran Cvetkovic, Peter Sollich, Steve Renals, Bin Yu 0001, 
Towards Robust Waveform-Based Acoustic Models.

Interspeech2022 Chau Luu, Steve Renals, Peter Bell 0001, 
Investigating the contribution of speaker attributes to speaker separability using disentangled speaker representations.

SpeechComm2021 Manuel Sam Ribeiro, Joanne Cleland, Aciel Eshky, Korin Richmond, Steve Renals
Exploiting ultrasound tongue imaging for the automatic detection of speech articulation errors.

SpeechComm2021 Aciel Eshky, Joanne Cleland, Manuel Sam Ribeiro, Eleanor Sugden, Korin Richmond, Steve Renals
Automatic audiovisual synchronisation for ultrasound tongue imaging.

ICASSP2021 Erfan Loweimi, Zoran Cvetkovic, Peter Bell 0001, Steve Renals
Speech Acoustic Modelling from Raw Phase Spectrum.

ICASSP2021 Shucong Zhang, Cong-Thanh Do, Rama Doddipatla, Erfan Loweimi, Peter Bell 0001, Steve Renals
Train Your Classifier First: Cascade Neural Networks Training from Upper Layers to Lower Layers.

Interspeech2021 Erfan Loweimi, Zoran Cvetkovic, Peter Bell 0001, Steve Renals
Speech Acoustic Modelling Using Raw Source and Filter Components.

Interspeech2021 Chau Luu, Peter Bell 0001, Steve Renals
Leveraging Speaker Attribute Information Using Multi Task Learning for Speaker Verification and Diarization.

Interspeech2021 Manuel Sam Ribeiro, Aciel Eshky, Korin Richmond, Steve Renals
Silent versus Modal Multi-Speaker Speech Recognition from Ultrasound and Video.

Interspeech2021 Shucong Zhang, Erfan Loweimi, Peter Bell 0001, Steve Renals
Stochastic Attention Head Removal: A Simple and Effective Method for Improving Transformer Based ASR Models.

ICASSP2020 Alberto Abad, Peter Bell 0001, Andrea Carmantini, Steve Renals
Cross Lingual Transfer Learning for Zero-Resource Domain Adaptation.

ICASSP2020 Chau Luu, Peter Bell 0001, Steve Renals
Channel Adversarial Training for Speaker Verification and Diarization.

ICASSP2020 Joanna Rownicka, Peter Bell 0001, Steve Renals
Multi-Scale Octave Convolutions for Robust Speech Recognition.

ICASSP2020 Shucong Zhang, Cong-Thanh Do, Rama Doddipatla, Steve Renals
Learning Noise Invariant Features Through Transfer Learning For Robust End-to-End Speech Recognition.

Interspeech2020 Ahmed Ali 0002, Steve Renals
Word Error Rate Estimation Without ASR Output: e-WER2.

Interspeech2020 Neethu M. Joy, Dino Oglic, Zoran Cvetkovic, Peter Bell 0001, Steve Renals
Deep Scattering Power Spectrum Features for Robust Speech Recognition.

Interspeech2020 Erfan Loweimi, Peter Bell 0001, Steve Renals
On the Robustness and Training Dynamics of Raw Waveform Models.

Interspeech2020 Erfan Loweimi, Peter Bell 0001, Steve Renals
Raw Sign and Magnitude Spectra for Multi-Head Acoustic Modelling.

Interspeech2020 Dino Oglic, Zoran Cvetkovic, Peter Bell 0001, Steve Renals
A Deep 2D Convolutional Network for Waveform-Based Speech Recognition.

ICASSP2019 Shucong Zhang, Erfan Loweimi, Peter Bell 0001, Steve Renals
Windowed Attention Mechanisms for Speech Recognition.

#69  | Simon King | Google Scholar   DBLP
VenuesInterspeech: 27ICASSP: 8TASLP: 4
Years2022: 32021: 32020: 62019: 72018: 52017: 42016: 11
ISCA Sectionspeech synthesis: 14text analysis, multilingual issues and evaluation in speech synthesis: 2intelligibility-enhancing speech modification: 1speech synthesis paradigms and methods: 1speech intelligibility: 1representation learning of emotion and paralinguistics: 1prosody modeling and generation: 1speech perception in adverse conditions: 1voice conversion and speech synthesis: 1glottal source modeling: 1special session: 1wavenet and novel paradigms: 1robustness in speech processing: 1
IEEE Keywordspeech synthesis: 11recurrent neural nets: 3speaker recognition: 3hidden markov models: 3natural language processing: 2linguistics: 2spoofing attack: 2filtering theory: 2vocoders: 2deep neural network: 2acoustic modelling: 2unit selection: 2fundamental frequency: 1neural network: 1variational auto encoder: 1speaker embedding: 1multilingual: 1subjective evaluation: 1cross language: 1tts: 1dnn: 1anti spoofing: 1asvspoof: 1automatic speaker verification: 1security of data: 1replay attacks: 1convolutional neural network: 1convolutional neural nets: 1speech reconstruction: 1neural vocoder: 1regression analysis: 1cross lingual speaker adaptation: 1statistical speech synthesis: 1speaker adaptation: 1nearest neighbour: 1bottleneck: 1minimum generation error: 1speaker verification: 1statistics: 1duration modelling: 1robust statistics: 1speech coding: 1hybrid synthesis: 1embedding: 1deep neural networks: 1electromagnetic articulography: 1join cost: 1hidden markov model: 1decision trees: 1decision tree: 1gated recurrent network: 1long short term memory: 1recurrent network network: 1
Most Publications2013: 262016: 212010: 172007: 172014: 16

Affiliations
University of Edinburgh, Scotland, UK

Interspeech2022 Jason Fong, Daniel Lyth, Gustav Eje Henter, Hao Tang, Simon King
Speech Audio Corrector: using speech from non-target speakers for one-off correction of mispronunciations in grapheme-input text-to-speech.

Interspeech2022 Sébastien Le Maguer, Simon King, Naomi Harte, 
Back to the Future: Extending the Blizzard Challenge 2013.

Interspeech2022 Johannah O'Mahony, Catherine Lai, Simon King
Combining conversational speech with read speech to improve prosody in Text-to-Speech synthesis.

Interspeech2021 Devang S. Ram Mohan, Qinmin Vivian Hu, Tian Huey Teh, Alexandra Torresquintero, Christopher G. R. Wallis, Marlene Staib, Lorenzo Foglianti, Jiameng Gao, Simon King
Ctrl-P: Temporal Control of Prosodic Variation for Speech Synthesis.

Interspeech2021 Alexandra Torresquintero, Tian Huey Teh, Christopher G. R. Wallis, Marlene Staib, Devang S. Ram Mohan, Vivian Hu, Lorenzo Foglianti, Jiameng Gao, Simon King
ADEPT: A Dataset for Evaluating Prosody Transfer.

Interspeech2021 Cassia Valentini-Botinhao, Simon King
Detection and Analysis of Attention Errors in Sequence-to-Sequence Text-to-Speech.

TASLP2020 Xin Wang 0037, Shinji Takaki, Junichi Yamagishi, Simon King, Keiichi Tokuda, 
A Vector Quantized Variational Autoencoder (VQ-VAE) Autoregressive Neural F0 Model for Statistical Parametric Speech Synthesis.

ICASSP2020 Ivan Himawan, Sandesh Aryal, Iris Ouyang, Sam Kang, Pierre Lanchantin, Simon King
Speaker Adaptation of a Multilingual Acoustic Model for Cross-Language Synthesis.

Interspeech2020 Carol Chermaz, Simon King
A Sound Engineering Approach to Near End Listening Enhancement.

Interspeech2020 Jason Fong, Jason Taylor, Simon King
Testing the Limits of Representation Mixing for Pronunciation Correction in End-to-End Speech Synthesis.

Interspeech2020 Pilar Oplustil Gallegos, Jennifer Williams 0001, Joanna Rownicka, Simon King
An Unsupervised Method to Select a Speaker Subset from Large Multi-Speaker Speech Synthesis Datasets.

Interspeech2020 Jacob J. Webber, Olivier Perrotin, Simon King
Hider-Finder-Combiner: An Adversarial Architecture for General Speech Signal Modification.

ICASSP2019 Cheng-I Lai, Alberto Abad, Korin Richmond, Junichi Yamagishi, Najim Dehak, Simon King
Attentive Filtering Networks for Audio Replay Attack Detection.

ICASSP2019 Oliver Watts, Cassia Valentini-Botinhao, Simon King
Speech Waveform Reconstruction Using Convolutional Neural Networks with Noise and Periodic Inputs.

Interspeech2019 Adèle Aubin, Alessandra Cervone, Oliver Watts, Simon King
Improving Speech Synthesis with Discourse Relations.

Interspeech2019 Carol Chermaz, Cassia Valentini-Botinhao, Henning F. Schepker, Simon King
Evaluating Near End Listening Enhancement Algorithms in Realistic Environments.

Interspeech2019 Jason Fong, Pilar Oplustil Gallegos, Zack Hodari, Simon King
Investigating the Robustness of Sequence-to-Sequence Text-to-Speech Models to Imperfectly-Transcribed Training Data.

Interspeech2019 Avashna Govender, Anita E. Wagner, Simon King
Using Pupil Dilation to Measure Cognitive Load When Listening to Text-to-Speech in Quiet and in Noise.

Interspeech2019 Jennifer Williams 0001, Simon King
Disentangling Style Factors from Speaker Representations.

Interspeech2018 Avashna Govender, Simon King
Using Pupillometry to Measure the Cognitive Load of Synthetic Speech.

#70  | Alan W. Black | Google Scholar   DBLP
VenuesInterspeech: 26ICASSP: 8TASLP: 2ACL: 2AAAI: 1
Years2022: 82021: 52020: 52019: 102018: 42017: 42016: 3
ISCA Sectionspoken language understanding: 3special session: 2low-resource asr development: 1human speech & signal processing: 1inclusive and fair speech technologies: 1speech processing & measurement: 1cross/multi-lingual and code-switched asr: 1spoken language processing: 1multilingual and code-switched asr: 1speech signal representation: 1speech and language analytics for mental health: 1the zero resource speech challenge 2019: 1speech translation: 1cross-lingual and multilingual asr: 1speech annotation and labelling: 1speech technologies for code-switching in multilingual communities: 1the interspeech 2019 computational paralinguistics challenge (compare): 1speech synthesis paradigms and methods: 1spoken dialogue systems and conversational analysis: 1the interspeech 2018 computational paralinguistics challenge (compare): 1voice conversion: 1dialogue: 1low resource speech recognition: 1
IEEE Keywordnatural language processing: 7speech recognition: 7text analysis: 5speech synthesis: 3multilingual: 2linguistics: 2image retrieval: 2language translation: 1spoken language understanding: 1public domain software: 1open source: 1speech based user interfaces: 1long sequence modeling: 1speech summarization: 1concept learning: 1end to end: 1low resource: 1convolutional neural nets: 1long short term memory: 1intent: 1cross lingual: 1multilingual phonetic dataset: 1multilingual speech alignment: 1low resource speech recognition: 1human computer interaction: 1image representation: 1automatic speech recognition: 1unsupervised learning: 1universal phone recognition: 1multilingual speech recognition: 1phonology: 1found speech data: 1natural languages: 1ctc based decoding: 1multilingual language models: 1low resource asr: 1phoneme level language models: 1multi modal data: 1unwritten languages: 1unsupervised unit discovery: 1machine translation: 1regression analysis: 1pattern classification: 1gaussian processes: 1post filter: 1modulation spectrum: 1trees (mathematics): 1smoothing methods: 1gmm based voice conversion: 1clustergen: 1hidden markov models: 1mixture models: 1global variance: 1oversmoothing: 1statistical parametric speech synthesis: 1
Most Publications2019: 462020: 402021: 352018: 262022: 22


ICASSP2022 Siddhant Arora, Siddharth Dalmia, Pavel Denisov, Xuankai Chang, Yushi Ueda, Yifan Peng, Yuekai Zhang, Sujay Kumar, Karthik Ganesan 0003, Brian Yan, Ngoc Thang Vu, Alan W. Black, Shinji Watanabe 0001, 
ESPnet-SLU: Advancing Spoken Language Understanding Through ESPnet.

ICASSP2022 Roshan Sharma, Shruti Palaskar, Alan W. Black, Florian Metze, 
End-to-End Speech Summarization Using Restricted Self-Attention.

Interspeech2022 Siddhant Arora, Siddharth Dalmia, Xuankai Chang, Brian Yan, Alan W. Black, Shinji Watanabe 0001, 
Two-Pass Low Latency End-to-End Spoken Language Understanding.

Interspeech2022 Xinjian Li, Florian Metze, David R. Mortensen, Alan W. Black, Shinji Watanabe 0001, 
ASR2K: Speech Recognition for Around 2000 Languages without Audio.

Interspeech2022 Jiachen Lian, Alan W. Black, Louis Goldstein, Gopala Krishna Anumanchipalli, 
Deep Neural Convolutive Matrix Factorization for Articulatory Representation Decomposition.

Interspeech2022 Perez Ogayo, Graham Neubig, Alan W. Black
Building African Voices.

Interspeech2022 Peter Wu, Shinji Watanabe 0001, Louis Goldstein, Alan W. Black, Gopala Krishna Anumanchipalli, 
Deep Speech Synthesis from Articulatory Representations.

Interspeech2022 Hemant Yadav, Akshat Gupta, Sai Krishna Rallabandi, Alan W. Black, Rajiv Ratn Shah, 
Intent classification using pre-trained language agnostic embeddings for low resource languages.

ICASSP2021 Akshat Gupta, Xinjian Li, Sai Krishna Rallabandi, Alan W. Black
Acoustics Based Intent Recognition Using Discovered Phonetic Units for Low Resource Languages.

ICASSP2021 Xinjian Li, David R. Mortensen, Florian Metze, Alan W. Black
Multilingual Phonetic Dataset for Low Resource Speech Recognition.

Interspeech2021 Siddhant Arora, Alissa Ostapenko, Vijay Viswanathan 0002, Siddharth Dalmia, Florian Metze, Shinji Watanabe 0001, Alan W. Black
Rethinking End-to-End Evaluation of Decomposable Tasks: A Case Study on Spoken Language Understanding.

Interspeech2021 Xinjian Li, Juncheng Li 0001, Florian Metze, Alan W. Black
Hierarchical Phone Recognition with Compositional Phonetics.

Interspeech2021 Shruti Palaskar, Ruslan Salakhutdinov, Alan W. Black, Florian Metze, 
Multimodal Speech Summarization Through Semantic Concept Learning.

TASLP2020 Odette Scharenborg, Lucas Ondel, Shruti Palaskar, Philip Arthur, Francesco Ciannella, Mingxing Du, Elin Larsen, Danny Merkx, Rachid Riad, Liming Wang, Emmanuel Dupoux, Laurent Besacier, Alan W. Black, Mark Hasegawa-Johnson, Florian Metze, Graham Neubig, Sebastian Stüker, Pierre Godard, Markus Müller 0001, 
Speech Technology for Unwritten Languages.

ICASSP2020 Xinjian Li, Siddharth Dalmia, Juncheng Li 0001, Matthew Lee 0012, Patrick Littell, Jiali Yao, Antonios Anastasopoulos, David R. Mortensen, Graham Neubig, Alan W. Black, Florian Metze, 
Universal Phone Recognition with a Multilingual Allophone System.

Interspeech2020 Khyathi Raghavi Chandu, Alan W. Black
Style Variation as a Vantage Point for Code-Switching.

Interspeech2020 Amrith Setlur, Barnabás Póczos, Alan W. Black
Nonlinear ISA with Auxiliary Variables for Learning Speech Representations.

ACL2020 Elizabeth Salesky, Alan W. Black
Phone Features Improve Speech Translation.

ICASSP2019 Alan W. Black
CMU Wilderness Multilingual Speech Dataset.

ICASSP2019 Siddharth Dalmia, Xinjian Li, Alan W. Black, Florian Metze, 
Phoneme Level Language Models for Sequence Based Low Resource ASR.

#71  | Hiroshi Saruwatari | Google Scholar   DBLP
VenuesInterspeech: 19ICASSP: 12TASLP: 6SpeechComm: 2
Years2022: 82021: 102020: 132019: 22018: 22017: 32016: 1
ISCA Sectionspeech synthesis: 9speech synthesis paradigms and methods: 2speech coding and restoration: 1the voicemos challenge: 1spoken language processing: 1voice conversion and adaptation: 1speech annotation and speech assessment: 1speech signal representation: 1voice conversion: 1speech enhancement and noise reduction: 1
IEEE Keywordspeech synthesis: 7regression analysis: 3interpolation: 3filtering theory: 3optimisation: 3blind source separation: 3covariance matrices: 3text to speech synthesis: 3sound field interpolation: 2kernel ridge regression: 2active noise control: 2speaker embedding: 2speaker recognition: 2diffuse noise: 2blind speech extraction: 2spatial covariance matrix: 2acoustic field: 2numerical analysis: 2sound reproduction: 2gaussian distribution: 2loudspeakers: 2audio signal processing: 2gaussian processes: 2speech recognition: 2music: 2deep neural networks: 2voice conversion: 2generative adversarial networks: 2hilbert spaces: 1reproducing kernel hilbert space: 1transfer functions: 1principal component analysis: 1acoustic transfer function: 1helmholtz equation: 1spatial active noise control: 1adaptive filter: 1kernel interpolation: 1sound field control: 1microphone arrays: 1adaptive filters: 1perceptual speaker similarity: 1multi speaker generative modeling: 1deep speaker representation learning: 1active learning: 1estimation theory: 1em algorithm: 1multizone sound field control: 1acoustic variables control: 1personal audio: 1physics computing: 1amplitude matching: 1pressure matching: 1domain adaptation: 1mutual information: 1text analysis: 1cross lingual: 1multivariate complex generalized gaussian distribution: 1spatial noise: 1convolution: 1matrix decomposition: 1joint diagonalization: 1spatial covariance model: 1frequency domain analysis: 1student’s t distribution: 1independent positive semidefinite tensor analysis: 1tensors: 1simple recurrent unit: 1sequential modeling: 1recurrent neural nets: 1bayesian deep model: 1deep gaussian process: 1discrete wavelet transform: 1discrete wavelet transforms: 1time domain audio source separation: 1source separation: 1wave u net: 1minimum phase filter: 1sub band processing: 1deep neural network: 1spectral differentials: 1hilbert transforms: 1sound field reproduction: 1mode matching method: 1spherical wavefunction expansion: 1modulation spectrum: 1inter utterance pitch variation: 1dnn based singing voice synthesis: 1moment matching network: 1artificial double tracking: 1over smoothing: 1statistical parametric speech synthesis: 1mean square error methods: 1vocoder free spss: 1multi resolution: 1fourier transform spectra: 1stft spectra: 1fourier transforms: 1vocoders: 1signal resolution: 1channel bank filters: 1generative adversarial training: 1dnn based speech synthesis: 1anti spoofing verification: 1multitask learning: 1training algorithm: 1formal verification: 1
Most Publications2021: 422022: 402020: 402012: 232005: 23

Affiliations
URLs

TASLP2022 Juliano G. C. Ribeiro, Natsuki Ueno, Shoichi Koyama, Hiroshi Saruwatari
Region-to-Region Kernel Interpolation of Acoustic Transfer Functions Constrained by Physical Properties.

Interspeech2022 Wataru Nakata, Tomoki Koriyama, Shinnosuke Takamichi, Yuki Saito, Yusuke Ijima, Ryo Masumura, Hiroshi Saruwatari
Predicting VQVAE-based Character Acting Style from Quotation-Annotated Text for Audiobook Speech Synthesis.

Interspeech2022 Yuto Nishimura, Yuki Saito, Shinnosuke Takamichi, Kentaro Tachibana, Hiroshi Saruwatari
Acoustic Modeling for End-to-End Empathetic Dialogue Speech Synthesis Using Linguistic and Prosodic Contexts of Dialogue History.

Interspeech2022 Takaaki Saeki, Shinnosuke Takamichi, Tomohiko Nakamura, Naoko Tanji, Hiroshi Saruwatari
SelfRemaster: Self-Supervised Speech Restoration with Analysis-by-Synthesis Approach Using Channel Modeling.

Interspeech2022 Takaaki Saeki, Detai Xin, Wataru Nakata, Tomoki Koriyama, Shinnosuke Takamichi, Hiroshi Saruwatari
UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022.

Interspeech2022 Yuki Saito, Yuto Nishimura, Shinnosuke Takamichi, Kentaro Tachibana, Hiroshi Saruwatari
STUDIES: Corpus of Japanese Empathetic Dialogue Speech Towards Friendly Voice Agent.

Interspeech2022 Shinnosuke Takamichi, Wataru Nakata, Naoko Tanji, Hiroshi Saruwatari
J-MAC: Japanese multi-speaker audiobook corpus for speech synthesis.

Interspeech2022 Kenta Udagawa, Yuki Saito, Hiroshi Saruwatari
Human-in-the-loop Speaker Adaptation for DNN-based Multi-speaker TTS.

SpeechComm2021 Masashi Aso, Shinnosuke Takamichi, Norihiro Takamune, Hiroshi Saruwatari
Acoustic model-based subword tokenization and prosodic-context extraction without language knowledge for text-to-speech synthesis.

SpeechComm2021 Kentaro Mitsui, Tomoki Koriyama, Hiroshi Saruwatari
Deep Gaussian process based multi-speaker speech synthesis with latent speaker representation.

TASLP2021 Shoichi Koyama, Jesper Brunnström, Hayato Ito, Natsuki Ueno, Hiroshi Saruwatari
Spatial Active Noise Control Based on Kernel Interpolation of Sound Field.

TASLP2021 Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari
Perceptual-Similarity-Aware Deep Speaker Representation Learning for Multi-Speaker Generative Modeling.

ICASSP2021 Yuto Kondo, Yuki Kubo, Norihiro Takamune, Daichi Kitamura, Hiroshi Saruwatari
Deficient Basis Estimation of Noise Spatial Covariance Matrix for Rank-Constrained Spatial Covariance Matrix Estimation Method in Blind Speech Extraction.

ICASSP2021 Shoichi Koyama, Takashi Amakasu, Natsuki Ueno, Hiroshi Saruwatari
Amplitude Matching: Majorization-Minimization Algorithm for Sound Field Control Only with Amplitude Constraint.

ICASSP2021 Detai Xin, Tatsuya Komatsu, Shinnosuke Takamichi, Hiroshi Saruwatari
Disentangled Speaker and Language Representations Using Mutual Information Minimization and Domain Adaptation for Cross-Lingual TTS.

Interspeech2021 Kazuki Mizuta, Tomoki Koriyama, Hiroshi Saruwatari
Harmonic WaveGAN: GAN-Based Speech Waveform Generation Model with Harmonic Structure Discriminator.

Interspeech2021 Taiki Nakamura, Tomoki Koriyama, Hiroshi Saruwatari
Sequence-to-Sequence Learning for Deep Gaussian Process Based Speech Synthesis Using Self-Attention GP Layer.

Interspeech2021 Detai Xin, Yuki Saito, Shinnosuke Takamichi, Tomoki Koriyama, Hiroshi Saruwatari
Cross-Lingual Speaker Adaptation Using Domain Adaptation and Speaker Consistency Loss for Text-To-Speech Synthesis.

TASLP2020 Yuki Kubo, Norihiro Takamune, Daichi Kitamura, Hiroshi Saruwatari
Blind Speech Extraction Based on Rank-Constrained Spatial Covariance Matrix Estimation With Multivariate Generalized Gaussian Distribution.

ICASSP2020 Hayato Ito, Shoichi Koyama, Natsuki Ueno, Hiroshi Saruwatari
Spatial Active Noise Control Based on Kernel Interpolation with Directional Weighting.

#72  | Chi-Chun Lee | Google Scholar   DBLP
VenuesInterspeech: 28ICASSP: 11
Years2022: 62021: 12020: 102019: 92018: 62017: 42016: 3
ISCA Sectionspeech emotion recognition: 5network architectures for emotion and paralinguistics recognition: 2social signals, styles, and interaction: 2(multimodal) speech emotion recognition: 1trustworthy speech processing: 1speech signal analysis and representation: 1speech in multimodality: 1diarization: 1computational paralinguistics: 1speech and language analytics for medical applications: 1acoustic phonetics: 1attention mechanism for speaker state recognition: 1the interspeech 2019 computational paralinguistics challenge (compare): 1representation learning for emotion: 1speech pathology, depression, and medical applications: 1speech and language analytics for mental health: 1the interspeech 2018 computational paralinguistics challenge (compare): 1deception, personality, and culture attribute: 1speaker states and traits: 1behavioral signal processing and speaker state and traits analytics: 1special session: 1speech and language processing for clinical health applications: 1
IEEE Keywordemotion recognition: 8speech recognition: 5speech emotion recognition: 4natural language processing: 2soft label learning: 2affective multimedia: 2conversation: 2convolutional neural nets: 2signal classification: 2interactive systems: 2behavioral signal processing (bsp): 2distribution label learning: 1multi label learning: 1auditory saliency: 1transformer: 1audio signal processing: 1cognition: 1image classification: 1adversarial domain adaptation: 1biomedical mri: 1diseases: 1multi site transfer: 1fmri: 1unsupervised learning: 1medical image processing: 1adhd: 1neurophysiology: 1regression analysis: 1mean square error methods: 1graph convolutional network: 1decision making: 1group performance: 1small group interaction: 1behavioural sciences computing: 1personality recognition: 1physiology: 1image representation: 1graph theory: 1video signal processing: 1graph convolution network: 1decoding: 1dialogical emotion decoder: 1adversarial network: 1cross corpus learning: 1blstm: 1annotator modeling: 1spoken dialogs: 1human computer interaction: 1attention mechanism: 1interaction: 1story telling: 1autism spectrum disorder (asd): 1text analysis: 1medical disorders: 1long short term memory neural network (lstm): 1lexical coherence: 1medical diagnostic computing: 1natural language interfaces: 1multi task learning: 1cross language: 1affective computing: 1
Most Publications2019: 262020: 222018: 162022: 152017: 13

Affiliations
URLs

ICASSP2022 Huang-Cheng Chou, Wei-Cheng Lin, Chi-Chun Lee, Carlos Busso, 
Exploiting Annotators' Typed Description of Emotion Perception to Maximize Utilization of Ratings for Speech Emotion Recognition.

ICASSP2022 Ya-Tse Wu, Jeng-Lin Li, Chi-Chun Lee
An Audio-Saliency Masking Transformer for Audio Emotion Classification in Movies.

Interspeech2022 Chun-Yu Chen, Yun-Shao Lin, Chi-Chun Lee
Emotion-Shift Aware CRF for Decoding Emotion Sequence in Conversation.

Interspeech2022 Huang-Cheng Chou, Chi-Chun Lee, Carlos Busso, 
Exploiting Co-occurrence Frequency of Emotions in Perceptual Evaluations To Train A Speech Emotion Classifier.

Interspeech2022 Yu-Lin Huang, Bo-Hao Su, Y.-W. Peter Hong, Chi-Chun Lee
An Attention-Based Method for Guiding Attribute-Aligned Speech Representation Learning.

Interspeech2022 Bo-Hao Su, Chi-Chun Lee
Vaccinating SER to Neutralize Adversarial Attacks with Self-Supervised Augmentation Strategy.

Interspeech2021 Yu-Lin Huang, Bo-Hao Su, Y.-W. Peter Hong, Chi-Chun Lee
An Attribute-Aligned Strategy for Learning Speech Representation.

ICASSP2020 Ya-Lin Huang, Wan-Ting Hsieh, Hao-Chun Yang, Chi-Chun Lee
Conditional Domain Adversarial Transfer for Robust Cross-Site ADHD Classification Using Functional MRI.

ICASSP2020 Yun-Shao Lin, Chi-Chun Lee
Predicting Performance Outcome with a Conversational Graph Convolutional Network for Small Group Interactions.

ICASSP2020 Hao-Chun Yang, Chi-Chun Lee
A Siamese Content-Attentive Graph Convolutional Network for Personality Recognition Using Physiology.

ICASSP2020 Sung-Lin Yeh, Yun-Shao Lin, Chi-Chun Lee
A Dialogical Emotion Decoder for Speech Motion Recognition in Spoken Dialog.

Interspeech2020 Huang-Cheng Chou, Chi-Chun Lee
Learning to Recognize Per-Rater's Emotion Perception Using Co-Rater Training Strategy with Soft and Hard Labels.

Interspeech2020 Jeng-Lin Li, Chi-Chun Lee
Using Speaker-Aligned Graph Memory Block in Multimodally Attentive Emotion Recognition Network.

Interspeech2020 Bo-Hao Su, Chun-Min Chang, Yun-Shao Lin, Chi-Chun Lee
Improving Speech Emotion Recognition Using Graph Attentive Bi-Directional Gated Recurrent Unit Network.

Interspeech2020 Shreya G. Upadhyay, Bo-Hao Su, Chi-Chun Lee
Attentive Convolutional Recurrent Neural Network Using Phoneme-Level Acoustic Representation for Rare Sound Event Detection.

Interspeech2020 Sung-Lin Yeh, Yun-Shao Lin, Chi-Chun Lee
Speech Representation Learning for Emotion Recognition Using End-to-End ASR with Factorized Adaptation.

Interspeech2020 Shun-Chang Zhong, Bo-Hao Su, Wei Huang, Yi-Ching Liu, Chi-Chun Lee
Predicting Collaborative Task Performance Using Graph Interlocutor Acoustic Network in Small Group Interaction.

ICASSP2019 Chun-Min Chang, Chi-Chun Lee
Adversarially-enriched Acoustic Code Vector Learned from Out-of-context Affective Corpus for Robust Emotion Recognition.

ICASSP2019 Huang-Cheng Chou, Chi-Chun Lee
Every Rating Matters: Joint Learning of Subjective Labels and Individual Annotators for Speech Emotion Classification.

ICASSP2019 Sung-Lin Yeh, Yun-Shao Lin, Chi-Chun Lee
An Interaction-aware Attention Network for Speech Emotion Recognition in Spoken Dialogs.

#73  | Brian Kingsbury | Google Scholar   DBLP
VenuesInterspeech: 21ICASSP: 18
Years2022: 102021: 102020: 52019: 62018: 12017: 22016: 5
ISCA Sectionspoken language understanding: 2asr neural network training: 2novel models and training methods for asr: 1neural transducers, streaming asr and novel asr models: 1multi-, cross-lingual and other topics in asr: 1asr: 1spoken language modeling and understanding: 1streaming for asr/rnn transducers: 1neural network training methods for asr: 1language and lexical modeling for asr: 1multimodal systems: 1low-resource speech recognition: 1novel neural network architectures for asr: 1multilingual and code-switched asr: 1summarization, semantic analysis and classification: 1asr neural network architectures and training: 1resources – annotation – evaluation: 1neural networks in speech recognition: 1low resource speech recognition: 1
IEEE Keywordspeech recognition: 13recurrent neural nets: 7natural language processing: 7spoken language understanding: 6automatic speech recognition: 6text analysis: 4end to end systems: 3data handling: 2decoding: 2rnn transducers: 2end to end models: 2end to end asr: 2lstm: 2keyword search: 2multilingual: 2acoustic modeling: 2software agents: 1virtual reality: 1intent classification: 1weakly supervised learning: 1text classification: 1nearest neighbour methods: 1nearest neighbors: 1voice conversations: 1speech coding: 1encoder decoder: 1atis: 1transducers: 1attention: 1spoken dialog system: 1interactive systems: 1natural languages: 1end to end mod els: 1language model customization: 1adaptation: 1data analysis: 1end to end systems: 1transformer networks: 1self supervised pre training: 1recurrent neural network transducer: 1multiplicative integration: 1sensor fusion: 1speaker recognition: 1speech to intent: 1synthetic speech augmentation: 1pre trained text embedding: 1noise injection: 1broadcast news: 1deep neural networks.: 1parallel computing: 1graphics processing units: 1switchboard.: 1parallel processing: 1hidden markov models: 1direct acoustics to word models: 1query processing: 1audio coding: 1feedforward neural nets: 1acoustic model: 1vgg: 1regression analysis: 1one vs one multi class classification: 1random fourier features: 1large scale kernel machines: 1deep neural networks: 1feature selection: 1kernel methods: 1logistic regression: 1convolutional networks: 1
Most Publications2021: 262022: 232013: 172019: 142020: 13

Affiliations
URLs

ICASSP2022 Zvi Kons, Aharon Satt, Hong-Kwang Kuo, Samuel Thomas 0001, Boaz Carmeli, Ron Hoory, Brian Kingsbury
A New Data Augmentation Method for Intent Classification Enhancement and its Application on Spoken Conversation Datasets.

ICASSP2022 Hong-Kwang Jeff Kuo, Zoltán Tüske, Samuel Thomas 0001, Brian Kingsbury, George Saon, 
Improving End-to-end Models for Set Prediction in Spoken Language Understanding.

ICASSP2022 Vishal Sunder, Samuel Thomas 0001, Hong-Kwang Jeff Kuo, Jatin Ganhotra, Brian Kingsbury, Eric Fosler-Lussier, 
Towards End-to-End Integration of Dialog History for Improved Spoken Language Understanding.

ICASSP2022 Samuel Thomas 0001, Hong-Kwang Jeff Kuo, Brian Kingsbury, George Saon, 
Towards Reducing the Need for Speech Training Data to Build Spoken Language Understanding Systems.

ICASSP2022 Samuel Thomas 0001, Brian Kingsbury, George Saon, Hong-Kwang Jeff Kuo, 
Integrating Text Inputs for Training and Adapting RNN Transducer ASR Models.

Interspeech2022 Xiaodong Cui, George Saon, Tohru Nagano, Masayuki Suzuki, Takashi Fukuda, Brian Kingsbury, Gakuto Kurata, 
Improving Generalization of Deep Neural Network Acoustic Models with Length Perturbation and N-best Based Label Smoothing.

Interspeech2022 Andrea Fasoli, Chia-Yu Chen, Mauricio J. Serrano, Swagath Venkataramani, George Saon, Xiaodong Cui, Brian Kingsbury, Kailash Gopalakrishnan, 
Accelerating Inference and Language Model Fusion of Recurrent Neural Network Transducers via End-to-End 4-bit Quantization.

Interspeech2022 Takashi Fukuda, Samuel Thomas 0001, Masayuki Suzuki, Gakuto Kurata, George Saon, Brian Kingsbury
Global RNN Transducer Models For Multi-dialect Speech Recognition.

Interspeech2022 Jiatong Shi, George Saon, David Haws, Shinji Watanabe 0001, Brian Kingsbury
VQ-T: RNN Transducers using Vector-Quantized Prediction Network States.

Interspeech2022 Vishal Sunder, Eric Fosler-Lussier, Samuel Thomas 0001, Hong-Kwang Kuo, Brian Kingsbury
Tokenwise Contrastive Pretraining for Finer Speech-to-BERT Alignment in End-to-End Speech-to-Intent Systems.

ICASSP2021 Samuel Thomas 0001, Hong-Kwang Jeff Kuo, George Saon, Zoltán Tüske, Brian Kingsbury, Gakuto Kurata, Zvi Kons, Ron Hoory, 
RNN Transducer Models for Spoken Language Understanding.

ICASSP2021 Edmilson da Silva Morais, Hong-Kwang Jeff Kuo, Samuel Thomas 0001, Zoltán Tüske, Brian Kingsbury
End-to-End Spoken Language Understanding Using Transformer Networks and Self-Supervised Pre-Trained Features.

ICASSP2021 George Saon, Zoltán Tüske, Daniel Bolaños, Brian Kingsbury
Advancing RNN Transducer Technology for Speech Recognition.

Interspeech2021 Xiaodong Cui, Brian Kingsbury, George Saon, David Haws, Zoltán Tüske, 
Reducing Exposure Bias in Training Recurrent Neural Network Transducers.

Interspeech2021 Andrea Fasoli, Chia-Yu Chen, Mauricio J. Serrano, Xiao Sun, Naigang Wang, Swagath Venkataramani, George Saon, Xiaodong Cui, Brian Kingsbury, Wei Zhang 0022, Zoltán Tüske, Kailash Gopalakrishnan, 
4-Bit Quantization of LSTM-Based Speech Recognition Models.

Interspeech2021 Jatin Ganhotra, Samuel Thomas 0001, Hong-Kwang Jeff Kuo, Sachindra Joshi, George Saon, Zoltán Tüske, Brian Kingsbury
Integrating Dialog History into End-to-End Spoken Language Understanding Systems.

Interspeech2021 Gakuto Kurata, George Saon, Brian Kingsbury, David Haws, Zoltán Tüske, 
Improving Customization of Neural Transducers by Mitigating Acoustic Mismatch of Synthesized Audio.

Interspeech2021 Andrew Rouditchenko, Angie W. Boggust, David Harwath, Samuel Thomas 0001, Hilde Kuehne, Brian Chen, Rameswar Panda, Rogério Feris, Brian Kingsbury, Michael Picheny, James R. Glass, 
Cascaded Multilingual Audio-Visual Learning from Videos.

Interspeech2021 Andrew Rouditchenko, Angie W. Boggust, David Harwath, Brian Chen, Dhiraj Joshi, Samuel Thomas 0001, Kartik Audhkhasi, Hilde Kuehne, Rameswar Panda, Rogério Schmidt Feris, Brian Kingsbury, Michael Picheny, Antonio Torralba 0001, James R. Glass, 
AVLnet: Learning Audio-Visual Language Representations from Instructional Videos.

Interspeech2021 Zoltán Tüske, George Saon, Brian Kingsbury
On the Limit of English Conversational Speech Recognition.

#74  | Jianwei Yu | Google Scholar   DBLP
VenuesICASSP: 17Interspeech: 17TASLP: 5
Years2022: 82021: 122020: 72019: 92018: 3
ISCA Sectionspeech recognition of atypical speech: 3topics in asr: 2speech synthesis: 1spoken dialogue systems and multimodality: 1multi-, cross-lingual and other topics in asr: 1acoustic event detection and classification: 1source separation, dereverberation and echo cancellation: 1multimodal speech processing: 1speech and speaker recognition: 1asr neural network architectures: 1medical applications and visual asr: 1lexicon and language model for speech recognition: 1novel neural network architectures for acoustic modelling: 1application of asr in medical practice: 1
IEEE Keywordspeech recognition: 15speaker recognition: 7bayes methods: 5speech separation: 5optimisation: 5natural language processing: 5recurrent neural nets: 5quantisation (signal): 4deep learning (artificial intelligence): 3multi channel: 3overlapped speech: 3language models: 3gradient methods: 3gaussian processes: 3audio visual systems: 2audio visual: 2speech intelligibility: 2multi look: 2variational inference: 2inference mechanisms: 2admm: 2transformer: 2quantization: 2speaker verification: 2speech synthesis: 2entropy: 2neural net architecture: 1time delay neural network: 1search problems: 1minimisation: 1uncertainty handling: 1neural architecture search: 1speech enhancement: 1dereverberation and recognition: 1reverberation: 1neural network quantization: 1mean square error methods: 1mixed precision: 1source separation: 1direction of arrival: 1speaker diarization: 1direction of arrival estimation: 1bayesian learning: 1domain adaptation: 1gaussian process: 1lf mmi: 1delays: 1generalisation (artificial intelligence): 1handicapped aids: 1data augmentation: 1speaker adaptation: 1multimodal speech recognition: 1disordered speech recognition: 1low bit quantization: 1lstm rnn: 1filtering theory: 1jointly fine tuning: 1microphone arrays: 1visual occlusion: 1overlapped speech recognition: 1image recognition: 1video signal processing: 1alzheimer's disease detection: 1features: 1cognition: 1adress: 1medical diagnostic computing: 1geriatrics: 1asr: 1diseases: 1signal classification: 1patient diagnosis: 1linguistics: 1elderly speech: 1automatic speech recognition: 1neurocognitive disorder detection: 1dementia: 1x vector: 1gmm i vector: 1adversarial attack: 1cross modal: 1dysarthric speech reconstruction: 1voice conversion: 1seq2seq: 1knowledge distillation: 1data compression: 1recurrent neural networks: 1alternating direction methods of multipliers: 1audio visual speech recognition: 1multi modal: 1speech coding: 1end to end: 1multilingual speech synthesis: 1foreign accent: 1code switching: 1activation function selection: 1gaussian process neural network: 1bayesian neural network: 1lstm: 1neural network language models: 1parameter estimation: 1emotion recognition: 1capsule networks: 1convolutional neural nets: 1speech emotion recognition: 1spatial relationship information: 1recurrent connection: 1utterance level features: 1rnnlms: 1natural gradient: 1limited memory bfgs: 1second order optimization: 1hessian matrices: 1recurrent neural network: 1language model: 1
Most Publications2022: 282021: 262020: 132019: 112023: 6

Affiliations
URLs

TASLP2022 Shoukang Hu, Xurong Xie, Mingyu Cui, Jiajun Deng, Shansong Liu, Jianwei Yu, Mengzhe Geng, Xunying Liu, Helen Meng, 
Neural Architecture Search for LF-MMI Trained Time Delay Neural Networks.

ICASSP2022 Guinan Li, Jianwei Yu, Jiajun Deng, Xunying Liu, Helen Meng, 
Audio-Visual Multi-Channel Speech Separation, Dereverberation and Recognition.

ICASSP2022 Junhao Xu, Jianwei Yu, Xunying Liu, Helen Meng, 
Mixed Precision DNN Quantization for Overlapped Speech Separation and Recognition.

ICASSP2022 Naijun Zheng, Na Li 0012, Jianwei Yu, Chao Weng, Dan Su 0002, Xunying Liu, Helen Meng, 
Multi-Channel Speaker Diarization Using Spatial Features for Meetings.

Interspeech2022 Ziqian Dai, Jianwei Yu, Yan Wang, Nuo Chen, Yanyao Bian, Guangzhi Li, Deng Cai 0002, Dong Yu 0001, 
Automatic Prosody Annotation with Pre-Trained Text-Speech Model.

Interspeech2022 Lingyun Feng, Jianwei Yu, Yan Wang, Songxiang Liu, Deng Cai 0002, Haitao Zheng, 
ASR-Robust Natural Language Understanding on ASR-GLUE dataset.

Interspeech2022 Jinchuan Tian, Jianwei Yu, Chunlei Zhang, Yuexian Zou, Dong Yu 0001, 
LAE: Language-Aware Encoder for Monolingual and Multilingual ASR.

Interspeech2022 Helin Wang, Dongchao Yang, Chao Weng, Jianwei Yu, Yuexian Zou, 
Improving Target Sound Extraction with Timestamp Information.

TASLP2021 Shoukang Hu, Xurong Xie, Shansong Liu, Jianwei Yu, Zi Ye, Mengzhe Geng, Xunying Liu, Helen Meng, 
Bayesian Learning of LF-MMI Trained Time Delay Neural Networks for Speech Recognition.

TASLP2021 Shansong Liu, Mengzhe Geng, Shoukang Hu, Xurong Xie, Mingyu Cui, Jianwei Yu, Xunying Liu, Helen Meng, 
Recent Progress in the CUHK Dysarthric Speech Recognition System.

TASLP2021 Junhao Xu, Jianwei Yu, Shoukang Hu, Xunying Liu, Helen Meng, 
Mixed Precision Low-Bit Quantization of Neural Network Language Models for Speech Recognition.

TASLP2021 Jianwei Yu, Shi-Xiong Zhang, Bo Wu, Shansong Liu, Shoukang Hu, Mengzhe Geng, Xunying Liu, Helen Meng, Dong Yu 0001, 
Audio-Visual Multi-Channel Integration and Recognition of Overlapped Speech.

ICASSP2021 Jinchao Li, Jianwei Yu, Zi Ye, Simon Wong, Man-Wai Mak, Brian Mak, Xunying Liu, Helen Meng, 
A Comparative Study of Acoustic and Linguistic Features Classification for Alzheimer's Disease Detection.

ICASSP2021 Junhao Xu, Shoukang Hu, Jianwei Yu, Xunying Liu, Helen Meng, 
Mixed Precision Quantization of Transformer Language Models for Speech Recognition.

ICASSP2021 Zi Ye, Shoukang Hu, Jinchao Li, Xurong Xie, Mengzhe Geng, Jianwei Yu, Junhao Xu, Boyang Xue, Shansong Liu, Xunying Liu, Helen Meng, 
Development of the Cuhk Elderly Speech Recognition System for Neurocognitive Disorder Detection Using the Dementiabank Corpus.

ICASSP2021 Naijun Zheng, Na Li 0012, Bo Wu, Meng Yu 0003, Jianwei Yu, Chao Weng, Dan Su 0002, Xunying Liu, Helen Meng, 
A Joint Training Framework of Multi-Look Separator and Speaker Embedding Extractor for Overlapped Speech.

Interspeech2021 Jiajun Deng, Fabian Ritter Gutierrez, Shoukang Hu, Mengzhe Geng, Xurong Xie, Zi Ye, Shansong Liu, Jianwei Yu, Xunying Liu, Helen Meng, 
Bayesian Parametric and Architectural Domain Adaptation of LF-MMI Trained TDNNs for Elderly and Dysarthric Speech Recognition.

Interspeech2021 Mengzhe Geng, Shansong Liu, Jianwei Yu, Xurong Xie, Shoukang Hu, Zi Ye, Zengrui Jin, Xunying Liu, Helen Meng, 
Spectro-Temporal Deep Features for Disordered Speech Assessment and Recognition.

Interspeech2021 Zengrui Jin, Mengzhe Geng, Xurong Xie, Jianwei Yu, Shansong Liu, Xunying Liu, Helen Meng, 
Adversarial Data Augmentation for Disordered Speech Recognition.

Interspeech2021 Helin Wang, Bo Wu, Lianwu Chen, Meng Yu 0003, Jianwei Yu, Yong Xu 0004, Shi-Xiong Zhang, Chao Weng, Dan Su 0002, Dong Yu 0001, 
TeCANet: Temporal-Contextual Attention Network for Environment-Aware Speech Dereverberation.

#75  | Andreas Stolcke | Google Scholar   DBLP
VenuesICASSP: 17Interspeech: 16TASLP: 3EMNLP: 1SpeechComm: 1
Years2022: 112021: 102020: 32019: 22018: 32017: 72016: 2
ISCA Sectionspeaker recognition and anti-spoofing: 2inclusive and fair speech technologies: 2speaker recognition: 2speaker diarization: 1multi- and cross-lingual asr, other topics in asr: 1self-supervision and semi-supervision for neural asr training: 1training strategies for asr: 1speaker recognition challenges and applications: 1rich transcription and asr systems: 1speech recognition for language learning: 1speech recognition: 1conversational telephone speech recognition: 1neural networks in speech recognition: 1
IEEE Keywordspeech recognition: 10speaker recognition: 7recurrent neural nets: 7natural language processing: 5blstm: 3conversational speech recognition: 3lace: 3embedding adaptation: 2speaker verification: 2optimisation: 2neural net architecture: 2feedforward neural nets: 2resnet: 2vgg: 2recurrent neural networks: 2convolutional neural networks: 2iterative methods: 2supervised learning: 1self supervised training: 1signal representation: 1human computer interaction: 1rejection mechanism: 1dialogue: 1interactive systems: 1diarization: 1multi task learning: 1automatic speech recognition: 1speaker identification: 1few shot open set learning: 1model fairness: 1score fusion: 1deep learning (artificial intelligence): 1second pass rescoring: 1bert: 1minimum wer training: 1pretrained model: 1masked language model: 1speech recognition safety: 1adversarial robustness: 1sequence modeling: 1bayes methods: 1robust speech recognition: 1pattern classification: 1mixup: 1metric learning: 1prototypical loss: 1interpolation: 1speaker diarization: 1online inference: 1pattern clustering: 1computational complexity: 1encoder decoder attractor: 1end to end neural diarization: 1multi accent asr: 1domain adversarial training: 1end to end asr: 1accent invariance: 1rnn transducer: 1emotion recognition: 1contrastive predictive coding: 1unsupervised pre training: 1unsupervised learning: 1speech emotion recognition: 1recurrent neural network transducer: 1multilingual: 1language identification: 1joint modeling: 1code switching: 1neural interfaces: 1reinforce: 1entropy: 1multitask training: 1spoken language understanding: 1meeting understanding: 1feature fusion: 1hot spots: 1involvement: 1acoustic modeling: 1language modeling: 1cepstral analysis: 1sentiment analysis: 1multimodal fusion: 1audio feature extraction: 1children's reading: 1mispronunciation detection: 1speech analysis: 1automatic reading annotation: 1system combination: 1lstm lm: 1convolution: 1human parity: 1cnn: 1spatial smoothing: 1smoothing methods: 1recurrent neural network: 1end to end training: 1ctc: 1eigenfilter: 1alignment: 1adaptive: 1mobile handsets: 1audio recording: 1audio fingerprint: 1meetings: 1greedy algorithms: 1
Most Publications2022: 262021: 252006: 182005: 182004: 16


ICASSP2022 Metehan Cekic, Ruirui Li 0002, Zeya Chen, Yuguang Yang 0004, Andreas Stolcke, Upamanyu Madhow, 
Self-Supervised Speaker Recognition Training using Human-Machine Dialogues.

ICASSP2022 Aparna Khare, Eunjung Han, Yuguang Yang 0004, Andreas Stolcke
ASR-Aware End-to-End Neural Diarization.

ICASSP2022 K. C. Kishan, Zhenning Tan, Long Chen, Minho Jin, Eunjung Han, Andreas Stolcke, Chul Lee, 
OpenFEAT: Improving Speaker Identification by Open-Set Few-Shot Embedding Adaptation with Transformer.

ICASSP2022 Hua Shen, Yuguang Yang 0004, Guoli Sun, Ryan Langman, Eunjung Han, Jasha Droppo, Andreas Stolcke
Improving Fairness in Speaker Verification via Group-Adapted Fusion Network.

ICASSP2022 Liyan Xu, Yile Gu, Jari Kolehmainen, Haidar Khan, Ankur Gandhe, Ariya Rastrow, Andreas Stolcke, Ivan Bulyko, 
RescoreBERT: Discriminative Speech Recognition Rescoring With Bert.

ICASSP2022 Chao-Han Huck Yang, Zeeshan Ahmed, Yile Gu, Joseph Szurley, Roger Ren, Linda Liu, Andreas Stolcke, Ivan Bulyko, 
Mitigating Closed-Model Adversarial Examples with Bayesian Neural Modeling for Enhanced End-to-End Speech Recognition.

ICASSP2022 Xin Zhang, Minho Jin, Roger Cheng, Ruirui Li 0002, Eunjung Han, Andreas Stolcke
Contrastive-mixup Learning for Improved Speaker Verification.

Interspeech2022 Long Chen, Yixiong Meng, Venkatesh Ravichandran, Andreas Stolcke
Graph-based Multi-View Fusion and Local Adaptation: Mitigating Within-Household Confusability for Speaker Identification.

Interspeech2022 Pranav Dheram, Murugesan Ramakrishnan, Anirudh Raju, I-Fan Chen, Brian King, Katherine Powell, Melissa Saboowala, Karan Shetty, Andreas Stolcke
Toward Fairness in Speech Recognition: Discovery and mitigation of performance disparities.

Interspeech2022 Minho Jin, Chelsea Ju, Zeya Chen, Yi-Chieh Liu, Jasha Droppo, Andreas Stolcke
Adversarial Reweighting for Speaker Verification Fairness.

Interspeech2022 Viet Anh Trinh, Pegah Ghahremani, Brian King, Jasha Droppo, Andreas Stolcke, Roland Maas, 
Reducing Geographic Disparities in Automatic Speech Recognition via Elastic Weight Consolidation.

ICASSP2021 Eunjung Han, Chul Lee, Andreas Stolcke
BW-EDA-EEND: streaming END-TO-END Neural Speaker Diarization for a Variable Number of Speakers.

ICASSP2021 Hu Hu, Xuesong Yang, Zeynab Raeesy, Jinxi Guo, Gokce Keskin, Harish Arsikere, Ariya Rastrow, Andreas Stolcke, Roland Maas, 
REDAT: Accent-Invariant Representation for End-To-End ASR by Domain Adversarial Training with Relabeling.

ICASSP2021 Mao Li, Bo Yang, Joshua Levy, Andreas Stolcke, Viktor Rozgic, Spyros Matsoukas, Constantinos Papayiannis, Daniel Bone, Chao Wang 0018, 
Contrastive Unsupervised Learning for Speech Emotion Recognition.

ICASSP2021 Surabhi Punjabi, Harish Arsikere, Zeynab Raeesy, Chander Chandak, Nikhil Bhave, Ankish Bansal, Markus Müller, Sergio Murillo, Ariya Rastrow, Andreas Stolcke, Jasha Droppo, Sri Garimella, Roland Maas, Mat Hans, Athanasios Mouchtaris, Siegfried Kunzmann, 
Joint ASR and Language Identification Using RNN-T: An Efficient Approach to Dynamic Language Switching.

ICASSP2021 Milind Rao, Pranav Dheram, Gautam Tiwari, Anirudh Raju, Jasha Droppo, Ariya Rastrow, Andreas Stolcke
DO as I Mean, Not as I Say: Sequence Loss Training for Spoken Language Understanding.

Interspeech2021 Long Chen, Venkatesh Ravichandran, Andreas Stolcke
Graph-Based Label Propagation for Semi-Supervised Speaker Identification.

Interspeech2021 Ruirui Li 0002, Chelsea J.-T. Ju, Zeya Chen, Hongda Mao, Oguz Elibol, Andreas Stolcke
Fusion of Embeddings Networks for Robust Combination of Text Dependent and Independent Speaker Recognition.

Interspeech2021 Yi-Chieh Liu, Eunjung Han, Chul Lee, Andreas Stolcke
End-to-End Neural Diarization: From Transformer to Conformer.

Interspeech2021 Swayambhu Nath Ray, Minhua Wu, Anirudh Raju, Pegah Ghahremani, Raghavendra Bilgi, Milind Rao, Harish Arsikere, Ariya Rastrow, Andreas Stolcke, Jasha Droppo, 
Listen with Intent: Improving Speech Recognition with Audio-to-Intent Front-End.

#76  | Ian McLoughlin 0001 | Google Scholar   DBLP
VenuesInterspeech: 22ICASSP: 12TASLP: 4
Years2022: 52021: 72020: 82019: 102018: 62017: 12016: 1
ISCA Sectionspeaker and language recognition: 3neural transducers, streaming asr and novel asr models: 1resource-constrained asr: 1language and accent recognition: 1acoustic event detection and acoustic scene classification: 1asr neural network architectures: 1learning techniques for speaker recognition: 1speech and voice disorders: 1asr neural network architectures and training: 1acoustic event detection: 1speaker recognition and diarization: 1speech synthesis: 1speech and audio classification: 1audio signal characterization: 1speaker verification using neural network methods: 1representation learning for emotion: 1acoustic scenes and rare events: 1novel neural network architectures for acoustic modelling: 1language recognition: 1speech and audio segmentation and classification: 1
IEEE Keywordspeech recognition: 6speaker recognition: 5convolutional neural nets: 3signal classification: 3speech separation: 3recurrent neural nets: 3audio signal processing: 3knowledge based systems: 2supervised learning: 2speaker verification: 2deep learning (artificial intelligence): 2source separation: 2sound event detection: 2audio tagging: 2label permutation problem: 2representation learning: 1anomalous sound detection: 1self supervised learning: 1end to end: 1unsupervised domain adaptation: 1label smoothing: 1knowledge distillation: 1emotion recognition: 1convolutional neural network: 1signal reconstruction: 1style transformation: 1speech emotion recognition: 1disentanglement: 1probability: 1sequence alignment: 1encoder decoder: 1post inference: 1inference mechanisms: 1end to end asr: 1multi granularity: 1embedding learning: 1dense residual networks: 1model ensemble: 1self attention: 1gan: 1speech enhancement: 1deconvolution: 1generative adversarial network: 1convolution: 1segan: 1pattern classification: 1music classification: 1multi view learning: 1music: 1audio classification: 1gradient blending: 1vocal source excitation: 1laryngectomy: 1patient rehabilitation: 1medical computing: 1speech synthesis: 1medical disorders: 1glottal flow model: 1whisper to speech conversion: 1signal representation: 1speaker identification: 1time domain: 1target tracking: 1time domain analysis: 1sparse encoder: 1semi supervised learning: 1weakly labeled: 1autoregressive processes: 1computational auditory scene analysis: 1iterative methods: 1glottal flow: 1glottal inverse filtering: 1voice quality: 1spectral tilt: 1spectral model: 1adaptive filters: 1estimation theory: 1overlapping sound: 1convolutional recurrent neural network: 1isolated sound: 1multi task: 1audio event detection: 1multi label: 1weakly labelled data: 1attention: 1statistics: 1language identification deep neural network i vector lid senones: 1natural language processing: 1
Most Publications2020: 252021: 232019: 142016: 142014: 12

Affiliations
Singapore Institute of Technology, Singapore
University of Science and Technology of China, Hefei, China
University of Kent, UK (2015 - 2019)
Nanyang Technological University, Singapore (former)
University of Birmingham, UK (PhD 1997)

ICASSP2022 Han Chen, Yan Song 0001, Li-Rong Dai 0001, Ian McLoughlin 0001, Lin Liu, 
Self-Supervised Representation Learning for Unsupervised Anomalous Sound Detection Under Domain Shift.

ICASSP2022 Hang-Rui Hu, Yan Song 0001, Ying Liu, Li-Rong Dai 0001, Ian McLoughlin 0001, Lin Liu, 
Domain Robust Deep Embedding Learning for Speaker Recognition.

ICASSP2022 Yuxuan Xi, Yan Song 0001, Li-Rong Dai 0001, Ian McLoughlin 0001, Lin Liu, 
Frontend Attributes Disentanglement for Speech Emotion Recognition.

Interspeech2022 Zhifu Gao, Shiliang Zhang, Ian McLoughlin 0001, Zhijie Yan, 
Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition.

Interspeech2022 Hang-Rui Hu, Yan Song 0001, Li-Rong Dai 0001, Ian McLoughlin 0001, Lin Liu, 
Class-Aware Distribution Alignment based Unsupervised Domain Adaptation for Speaker Verification.

TASLP2021 Jian Tang, Jie Zhang 0042, Yan Song 0001, Ian McLoughlin 0001, Li-Rong Dai 0001, 
Multi-Granularity Sequence Alignment Mapping for Encoder-Decoder Based End-to-End ASR.

ICASSP2021 Ying Liu, Yan Song 0001, Ian McLoughlin 0001, Lin Liu, Li-Rong Dai 0001, 
An Effective Deep Embedding Learning Method Based on Dense-Residual Networks for Speaker Verification.

ICASSP2021 Huy Phan, Huy Le Nguyen, Oliver Y. Chén, Philipp Koch, Ngoc Q. K. Duong, Ian McLoughlin 0001, Alfred Mertins, 
Self-Attention Generative Adversarial Network for Speech Enhancement.

ICASSP2021 Huy Phan, Huy Le Nguyen, Oliver Y. Chén, Lam Dang Pham, Philipp Koch, Ian McLoughlin 0001, Alfred Mertins, 
Multi-View Audio And Music Classification.

Interspeech2021 Zhifu Gao, Yiwu Yao, Shiliang Zhang, Jun Yang, Ming Lei, Ian McLoughlin 0001
Extremely Low Footprint End-to-End ASR System for Smart Device.

Interspeech2021 Hui Wang, Lin Liu, Yan Song 0001, Lei Fang, Ian McLoughlin 0001, Li-Rong Dai 0001, 
A Weight Moving Average Based Alternate Decoupled Learning Algorithm for Long-Tailed Language Identification.

Interspeech2021 Xu Zheng, Yan Song 0001, Li-Rong Dai 0001, Ian McLoughlin 0001, Lin Liu, 
An Effective Mutual Mean Teaching Based Domain Adaptation Method for Sound Event Detection.

TASLP2020 Olivier Perrotin, Ian Vince McLoughlin
Glottal Flow Synthesis for Whisper-to-Speech Conversion.

ICASSP2020 Hui Wang, Yan Song 0001, Zengxi Li, Ian McLoughlin 0001, Li-Rong Dai 0001, 
An Online Speaker-aware Speech Separation Approach Based on Time-domain Representation.

ICASSP2020 Jie Yan, Yan Song 0001, Li-Rong Dai 0001, Ian McLoughlin 0001
Task-Aware Mean Teacher Method for Large Scale Weakly Labeled Semi-Supervised Sound Event Detection.

Interspeech2020 Zhifu Gao, Shiliang Zhang, Ming Lei, Ian McLoughlin 0001
SAN-M: Memory Equipped Self-Attention for End-to-End Speech Recognition.

Interspeech2020 Ying Liu, Yan Song 0001, Yiheng Jiang, Ian McLoughlin 0001, Lin Liu, Li-Rong Dai 0001, 
An Effective Speaker Recognition Method Based on Joint Identification and Verification Supervisions.

Interspeech2020 Han Tong, Hamid R. Sharifzadeh, Ian McLoughlin 0001
Automatic Assessment of Dysarthric Severity Level Using Audio-Video Cross-Modal Approach in Deep Learning.

Interspeech2020 Zi-qiang Zhang, Yan Song 0001, Jian-Shu Zhang, Ian McLoughlin 0001, Li-Rong Dai 0001, 
Semi-Supervised End-to-End ASR via Teacher-Student Learning with Conditional Posterior Distribution.

Interspeech2020 Xu Zheng, Yan Song 0001, Jie Yan, Li-Rong Dai 0001, Ian McLoughlin 0001, Lin Liu, 
An Effective Perturbation Based Semi-Supervised Learning Method for Sound Event Detection.

#77  | Wenwu Wang 0001 | Google Scholar   DBLP
VenuesICASSP: 16Interspeech: 11TASLP: 9SpeechComm: 2
Years2022: 72021: 52020: 72019: 62018: 52017: 72016: 1
ISCA Sectionsource separation: 2acoustic event detection: 2music and audio processing: 2acoustic scene analysis: 1acoustic event detection and classification: 1speaker and language recognition: 1acoustic event detection and acoustic scene classification: 1speech coding and audio processing for noise reduction: 1
IEEE Keywordaudio signal processing: 14source separation: 6audio tagging: 5convolutional neural nets: 4weakly labelled data: 4signal classification: 4particle filtering (numerical methods): 3filtering theory: 3audio visual systems: 3object tracking: 3probability: 3sound event detection: 2deep learning (artificial intelligence): 2object detection: 2target tracking: 2microphone arrays: 2pattern clustering: 2speech recognition: 2multiple instance learning: 2time frequency analysis: 2deep neural networks: 2speech intelligibility: 2monaural source separation: 2reverberation: 2particle flow: 2monte carlo methods: 2smc phd filter: 2audio visual tracking: 2gaussian processes: 2mixture models: 2blind source separation: 2loudspeakers: 2cross modal task: 1reinforcement learning: 1backpropagation: 1maximum likelihood estimation: 1natural language processing: 1gans: 1audio captioning: 1transductive inference: 1few shot learning: 1mutual learning: 1signal detection: 1speaker recognition: 1direction of arrival estimation: 1tracking: 1multiple speaker tracking: 1pmbm filter: 1audio visual fusion: 1neural net architecture: 1monaural singing voice separation: 1evolving multi resolution pooling cnn: 1voice activity detection: 1genetic algorithm: 1pareto optimisation: 1music: 1signal resolution: 1genetic algorithms: 1neural architecture search: 1two stream framework: 1class wise attentional clips: 1weak labels: 1emotion recognition: 1pretrained audio neural networks: 1transfer learning: 1computational complexity: 1remote sensing: 1image classification: 1scene classification: 1metric learning: 1class imbalance: 1meta learning: 1spatial attention: 1channel wise attention: 1supervised learning: 1out of distribution: 1convolutional neural network: 1pseudo labelling: 1audio classification: 1analytical solution: 1frequency estimation: 1jacobsen estimator: 1window function: 1interpolated dft: 1interpolation: 1discrete fourier transforms: 1audioset: 1attention neural network: 1signal representation: 1acoustic wave propagation: 1binaural audio: 1comb filter effect: 1interaural coherence: 1ipd: 1multipath propagation: 1ild: 1rirs: 1comb filters: 1dereverberation mask: 1transient response: 1highly reverberant room environments: 1background adaptation: 1audio quality: 1neural network: 1listening experience: 1intelligibility: 1recurrent neural network: 1recurrent neural nets: 1proximal algorithm: 1reverberant speech mixtures: 1bootstrap averaging: 1time frequency (t f) masking: 1frequency domain analysis: 1gaussian mixture model (gmm): 1parameter estimation: 1model based source separation: 1expectation maximization (em) algorithm: 1spectral histogram: 1expectation maximisation algorithm: 1audio recording: 1image segmentation: 1iterative methods: 1spectral and spatial: 1regression analysis: 1binaural blind speech separation: 1iterative dnn: 1multilayer perceptrons: 1deep neural network: 1partial differential equations: 1linear programming: 1 $\ell _{1}$ optimization: 1sound reproduction: 1mesh generation: 1spatial sound reproduction: 1sparsity: 1vbap: 1compressed sensing: 1compressive sampling: 1amplitude panning: 1image sources: 1reflectors: 1geometry reconstruction: 1acoustic scene analysis: 1room impulse responses: 1ellipsoids: 1dcase 2016: 1environmental audio tagging: 1deep de noising auto encoder: 1unsupervised feature learning: 1audio signals: 1acoustic event detection: 1joint detection classification model: 1acoustic signal detection: 1
Most Publications2022: 582021: 392018: 392019: 322017: 30

Affiliations
University of Surrey, Guildford, UK

ICASSP2022 Xinhao Mei, Xubo Liu, Jianyuan Sun, Mark D. Plumbley, Wenwu Wang 0001
Diverse Audio Captioning Via Adversarial Training.

ICASSP2022 Dongchao Yang, Helin Wang, Yuexian Zou, Zhongjie Ye, Wenwu Wang 0001
A Mutual Learning Framework for Few-Shot Sound Event Detection.

ICASSP2022 Jinzheng Zhao, Peipei Wu, Xubo Liu, Yong Xu 0004, Lyudmila Mihaylova, Simon J. Godsill, Wenwu Wang 0001
Audio-Visual Tracking of Multiple Speakers Via a PMBM Filter.

Interspeech2022 Xubo Liu, Haohe Liu, Qiuqiang Kong, Xinhao Mei, Jinzheng Zhao, Qiushi Huang, Mark D. Plumbley, Wenwu Wang 0001
Separate What You Describe: Language-Queried Audio Source Separation.

Interspeech2022 Xinhao Mei, Xubo Liu, Jianyuan Sun, Mark D. Plumbley, Wenwu Wang 0001
On Metric Learning for Audio-Text Cross-Modal Retrieval.

Interspeech2022 Dongchao Yang, Helin Wang, Zhongjie Ye, Yuexian Zou, Wenwu Wang 0001
RaDur: A Reference-aware and Duration-robust Network for Target Sound Detection.

Interspeech2022 Jinzheng Zhao, Peipei Wu, Xubo Liu, Shidrokh Goudarzi, Haohe Liu, Yong Xu 0004, Wenwu Wang 0001
Audio Visual Multi-Speaker Tracking with Improved GCF and PMBM Filter.

TASLP2021 Weitao Yuan, Bofei Dong, Shengbei Wang, Masashi Unoki, Wenwu Wang 0001
Evolving Multi-Resolution Pooling CNN for Monaural Singing Voice Separation.

ICASSP2021 Shuoyang Li, Yuhui Luo, Jonathon A. Chambers, Wenwu Wang 0001
Dimension Selected Subspace Clustering.

ICASSP2021 Helin Wang, Yuexian Zou, Wenwu Wang 0001
A Global-Local Attention Framework for Weakly Labelled Audio Tagging.

Interspeech2021 Helin Wang, Yuexian Zou, Wenwu Wang 0001
SpecAugment++: A Hidden Space Data Augmentation Method for Acoustic Scene Classification.

Interspeech2021 Weitao Yuan, Shengbei Wang, Xiangrui Li, Masashi Unoki, Wenwu Wang 0001
Crossfire Conditional Generative Adversarial Networks for Singing Voice Extraction.

TASLP2020 Qiuqiang Kong, Yin Cao, Turab Iqbal, Yuxuan Wang 0002, Wenwu Wang 0001, Mark D. Plumbley, 
PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition.

ICASSP2020 Jian Guan, Jiabei Liu, Jianguo Sun, Pengming Feng, Tong Shuai, Wenwu Wang 0001
Meta Metric Learning for Highly Imbalanced Aerial Scene Classification.

ICASSP2020 Sixin Hong, Yuexian Zou, Wenwu Wang 0001, Meng Cao, 
Weakly Labelled Audio Tagging Via Convolutional Networks with Spatial and Channel-Wise Attention.

ICASSP2020 Turab Iqbal, Yin Cao, Qiuqiang Kong, Mark D. Plumbley, Wenwu Wang 0001
Learning With Out-of-Distribution Data for Audio Classification.

ICASSP2020 Takahiro Murakami, Wenwu Wang 0001
An Analytical Solution to Jacobsen Estimator for Windowed Signals.

Interspeech2020 Sixin Hong, Yuexian Zou, Wenwu Wang 0001
Gated Multi-Head Attention Pooling for Weakly Labelled Audio Tagging.

Interspeech2020 Helin Wang, Yuexian Zou, Dading Chong, Wenwu Wang 0001
Environmental Sound Classification with Parallel Temporal-Spectral Attention.

TASLP2019 Qiuqiang Kong, Changsong Yu, Yong Xu 0004, Turab Iqbal, Wenwu Wang 0001, Mark D. Plumbley, 
Weakly Labelled AudioSet Tagging With Attention Neural Networks.

#78  | Sakriani Sakti | Google Scholar   DBLP
VenuesInterspeech: 26TASLP: 9ICASSP: 2SpeechComm: 1
Years2022: 32021: 62020: 82019: 72018: 42017: 42016: 6
ISCA Sectionspoken machine translation: 3speech synthesis: 2the zero resource speech challenge 2020: 2the zero resource speech challenge 2019: 2speech translation: 2self-supervised, semi-supervised, adaptation and data augmentation for asr: 1low-resource speech recognition: 1lm adaptation, lexical units and punctuation: 1general topics in speech recognition: 1neural signals for spoken communication: 1topics in asr: 1search methods for speech recognition: 1speech in the brain: 1sequence models for asr: 1acoustic model adaptation: 1statistical parametric speech synthesis: 1cognition and brain studies: 1speech translation and metadata for linguistic/discourse structure: 1automatic learning of representations: 1low resource speech recognition: 1
IEEE Keywordspeech synthesis: 6speech recognition: 5gaussian processes: 3recurrent neural nets: 2natural language processing: 2zerospeech: 2unsupervised phoneme discovery: 2dpgmm: 2unsupervised learning: 2speech chain: 2asr: 2signal reconstruction: 2tts: 2regression analysis: 2hidden markov models: 2lombard effect: 1speech intelligibility: 1machine speech chain inference: 1acoustic noise: 1text to speech: 1dynamic adaptation: 1signal denoising: 1low resource asr: 1hearing: 1infant speech perception: 1engrams: 1functional load: 1rnn: 1perception of phonemes: 1automatic speech recognition: 1emotion recognition: 1affective computing: 1chat based dialogue system: 1human computer interaction: 1information retrieval: 1emotion elicitation: 1interactive systems: 1electroencephalography: 1medical signal processing: 1blind source separation: 1brain: 1speech artifact removal: 1independent component analysis: 1spoken word production: 1tensor decomposition: 1eeg: 1cognition: 1neurophysiology: 1straight through estimator: 1end to end feedback loss: 1dirichlet process: 1mixture of mixtures: 1unsupervised subword modeling: 1monte carlo methods: 1gibbs sampling: 1acoustic unit discovery: 1bayesian nonparametrics: 1markov processes: 1language translation: 1emphasis estimation: 1word level emphasis: 1speech to speech translation: 1emphasis translation: 1intent: 1pattern classification: 1post filter: 1modulation spectrum: 1trees (mathematics): 1smoothing methods: 1gmm based voice conversion: 1clustergen: 1mixture models: 1global variance: 1oversmoothing: 1statistical parametric speech synthesis: 1
Most Publications2014: 412018: 372015: 352020: 262019: 26

Affiliations
URLs

TASLP2022 Sashi Novitasari, Sakriani Sakti, Satoshi Nakamura 0001, 
A Machine Speech Chain Approach for Dynamically Adaptive Lombard TTS in Static and Dynamic Noise Environments.

TASLP2022 Bin Wu, Sakriani Sakti, Jinsong Zhang 0001, Satoshi Nakamura 0001, 
Modeling Unsupervised Empirical Adaptation by DPGMM and DPGMM-RNN Hybrid Model to Extract Perceptual Features for Low-Resource ASR.

Interspeech2022 Heli Qi, Sashi Novitasari, Sakriani Sakti, Satoshi Nakamura 0001, 
Improved Consistency Training for Semi-Supervised Sequence-to-Sequence ASR via Speech Chain Reconstruction and Self-Transcribing.

TASLP2021 Bin Wu, Sakriani Sakti, Jinsong Zhang 0001, Satoshi Nakamura 0001, 
Tackling Perception Bias in Unsupervised Phoneme Discovery Using DPGMM-RNN Hybrid Model and Functional Load.

Interspeech2021 Johanes Effendi, Sakriani Sakti, Satoshi Nakamura 0001, 
Weakly-Supervised Speech-to-Text Mapping with Visually Connected Non-Parallel Speech-Text Data Using Cyclic Partially-Aligned Transformer.

Interspeech2021 Yuka Ko, Katsuhito Sudoh, Sakriani Sakti, Satoshi Nakamura 0001, 
ASR Posterior-Based Loss for Multi-Task End-to-End Speech Translation.

Interspeech2021 Sashi Novitasari, Sakriani Sakti, Satoshi Nakamura 0001, 
Dynamically Adaptive Machine Speech Chain Inference for TTS in Noisy Environment: Listen and Speak Louder.

Interspeech2021 Shun Takahashi, Sakriani Sakti, Satoshi Nakamura 0001, 
Unsupervised Neural-Based Graph Clustering for Variable-Length Speech Representation Discovery of Zero-Resource Languages.

Interspeech2021 Hirotaka Tokuyama, Sakriani Sakti, Katsuhito Sudoh, Satoshi Nakamura 0001, 
Transcribing Paralinguistic Acoustic Cues to Target Language Text in Transformer-Based Speech-to-Text Translation.

TASLP2020 Andros Tjandra, Sakriani Sakti, Satoshi Nakamura 0001, 
Machine Speech Chain.

TASLP2020 Andros Tjandra, Sakriani Sakti, Satoshi Nakamura 0001, 
Corrections to "Machine Speech Chain".

Interspeech2020 Ewan Dunbar, Julien Karadayi, Mathieu Bernard, Xuan-Nga Cao, Robin Algayres, Lucas Ondel, Laurent Besacier, Sakriani Sakti, Emmanuel Dupoux, 
The Zero Resource Speech Challenge 2020: Discovering Discrete Subword and Word Units.

Interspeech2020 Johanes Effendi, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura 0001, 
Augmenting Images for ASR and TTS Through Single-Loop and Dual-Loop Multimodal Chain Framework.

Interspeech2020 Sashi Novitasari, Andros Tjandra, Tomoya Yanagita, Sakriani Sakti, Satoshi Nakamura 0001, 
Incremental Machine Speech Chain Towards Enabling Listening While Speaking in Real-Time.

Interspeech2020 Ivan Halim Parmonangan, Hiroki Tanaka, Sakriani Sakti, Satoshi Nakamura 0001, 
Combining Audio and Brain Activity for Predicting Speech Quality.

Interspeech2020 Andros Tjandra, Sakriani Sakti, Satoshi Nakamura 0001, 
Transformer VQ-VAE for Unsupervised Unit Discovery and Speech Synthesis: ZeroSpeech 2020 Challenge.

Interspeech2020 Kazuki Tsunematsu, Johanes Effendi, Sakriani Sakti, Satoshi Nakamura 0001, 
Neural Speech Completion.

TASLP2019 Nurul Lubis, Sakriani Sakti, Koichiro Yoshino, Satoshi Nakamura 0001, 
Positive Emotion Elicitation in Chat-Based Dialogue Systems.

ICASSP2019 Holy Lovenia, Hiroki Tanaka, Sakriani Sakti, Ayu Purwarianti, Satoshi Nakamura 0001, 
Speech Artifact Removal from Eeg Recordings of Spoken Word Production with Tensor Decomposition.

ICASSP2019 Andros Tjandra, Sakriani Sakti, Satoshi Nakamura 0001, 
End-to-end Feedback Loss in Speech Chain Framework via Straight-through Estimator.

#79  | Xin Wang 0037 | Google Scholar   DBLP
VenuesInterspeech: 18ICASSP: 14TASLP: 5SpeechComm: 1
Years2022: 42021: 52020: 112019: 72018: 42017: 32016: 4
ISCA Sectionspeech synthesis: 8voice anti-spoofing and countermeasure: 3voice privacy challenge: 2speech coding and restoration: 1speech synthesis paradigms and methods: 1the 2019 automatic speaker verification spoofing and countermeasures challenge: 1prosody modeling and generation: 1speech synthesis prosody: 1
IEEE Keywordspeech synthesis: 14speaker recognition: 5neural network: 4speech recognition: 3speech coding: 3autoregressive processes: 3vocoders: 3recurrent neural nets: 3natural language processing: 2presentation attack detection: 2speaker verification: 2text to speech: 2variational auto encoder: 2hidden markov models: 2fourier transforms: 2fundamental frequency: 2wavenet: 2autoregressive model: 2linkability: 1speaker anonymization: 1voice conversion: 1privacy: 1data privacy: 1estimation theory: 1anti spoofing: 1countermeasure: 1logical access: 1computer crime: 1tdnn: 1feedforward neural nets: 1resnet: 1attention: 1deep learning (artificial intelligence): 1entertainment: 1listening test: 1rakugo: 1duration modeling: 1vector quantization: 1spoofing counter measures: 1security of data: 1automatic speaker verification (asv): 1detect ion cost function: 1waveform model: 1convolution: 1filtering theory: 1short time fourier transform: 1transfer learning: 1speaker adaptation: 1speaker embeddings: 1performance evaluation: 1speech enhancement: 1child speech extraction: 1speech separation: 1measures: 1realistic conditions: 1signal classification: 1source separation: 1reverberation: 1sequences: 1probability: 1search problems: 1stochastic processes: 1sequence to sequence model: 1sampling methods: 1zero shot adaptation: 1fine tuning: 1musical instruments: 1audio signal processing: 1music: 1neural waveform synthesizer: 1musical instrument sounds synthesis: 1neural net architecture: 1neural waveform modeling: 1maximum likelihood estimation: 1spectral analysis: 1waveform analysis: 1gaussian distribution: 1waveform modeling: 1waveform generators: 1gradient methods: 1tacotron: 1pipelines: 1text analysis: 1f0: 1pitch: 1general adversarial network: 1autoregressive moving average processes: 1autoregressive neural network: 1recurrent neural network: 1filters: 1mixture density network: 1linear transform: 1model clustering: 1pattern clustering: 1hidden markov model: 1singing voice synthesis: 1
Most Publications2021: 242020: 242022: 212019: 212018: 18

Affiliations
Graduate University for Advanced Studies (SOKENDAI), National Institute of Informatics, Department of Informatics, Tokyo, Japan
URLs

TASLP2022 Brij Mohan Lal Srivastava, Mohamed Maouche, Md. Sahidullah, Emmanuel Vincent 0001, Aurélien Bellet, Marc Tommasi, Natalia A. Tomashenko, Xin Wang 0037, Junichi Yamagishi, 
Privacy and Utility of X-Vector Based Speaker Anonymization.

ICASSP2022 Xin Wang 0037, Junichi Yamagishi, 
Estimating the Confidence of Speech Spoofing Countermeasure.

ICASSP2022 Chang Zeng, Xin Wang 0037, Erica Cooper, Xiaoxiao Miao, Junichi Yamagishi, 
Attention Back-End for Automatic Speaker Verification with Multiple Enrollment Utterances.

Interspeech2022 Xiaoxiao Miao, Xin Wang 0037, Erica Cooper, Junichi Yamagishi, Natalia A. Tomashenko, 
Analyzing Language-Independent Speaker Anonymization Framework under Unseen Conditions.

ICASSP2021 Shuhei Kato, Yusuke Yasuda, Xin Wang 0037, Erica Cooper, Junichi Yamagishi, 
How Similar or Different is Rakugo Speech Synthesizer to Professional Performers?

ICASSP2021 Yusuke Yasuda, Xin Wang 0037, Junichi Yamagishi, 
End-to-End Text-to-Speech Using Latent Duration Based on VQ-VAE.

Interspeech2021 Tomi Kinnunen, Andreas Nautsch, Md. Sahidullah, Nicholas W. D. Evans, Xin Wang 0037, Massimiliano Todisco, Héctor Delgado, Junichi Yamagishi, Kong Aik Lee, 
Visualizing Classifier Adjacency Relations: A Case Study in Speaker Verification and Voice Anti-Spoofing.

Interspeech2021 Xin Wang 0037, Junichi Yamagishi, 
A Comparative Study on Recent Neural Spoofing Countermeasures for Synthetic Speech Detection.

Interspeech2021 Lin Zhang, Xin Wang 0037, Erica Cooper, Junichi Yamagishi, Jose Patino 0001, Nicholas W. D. Evans, 
An Initial Investigation for Detecting Partially Spoofed Audio.

TASLP2020 Tomi Kinnunen, Héctor Delgado, Nicholas W. D. Evans, Kong Aik Lee, Ville Vestman, Andreas Nautsch, Massimiliano Todisco, Xin Wang 0037, Md. Sahidullah, Junichi Yamagishi, Douglas A. Reynolds, 
Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification: Fundamentals.

TASLP2020 Xin Wang 0037, Shinji Takaki, Junichi Yamagishi, 
Neural Source-Filter Waveform Models for Statistical Parametric Speech Synthesis.

TASLP2020 Xin Wang 0037, Shinji Takaki, Junichi Yamagishi, Simon King, Keiichi Tokuda, 
A Vector Quantized Variational Autoencoder (VQ-VAE) Autoregressive Neural F0 Model for Statistical Parametric Speech Synthesis.

ICASSP2020 Erica Cooper, Cheng-I Lai, Yusuke Yasuda, Fuming Fang, Xin Wang 0037, Nanxin Chen, Junichi Yamagishi, 
Zero-Shot Multi-Speaker Text-To-Speech with State-Of-The-Art Neural Speaker Embeddings.

ICASSP2020 Xin Wang 0037, Jun Du, Alejandrina Cristià, Lei Sun, Chin-Hui Lee, 
A Study of Child Speech Extraction Using Joint Speech Enhancement and Separation in Realistic Conditions.

ICASSP2020 Yusuke Yasuda, Xin Wang 0037, Junichi Yamagishi, 
Effect of Choice of Probability Distribution, Randomness, and Search Methods for Alignment Modeling in Sequence-to-Sequence Text-to-Speech Synthesis Using Hard Alignment.

ICASSP2020 Yi Zhao 0006, Xin Wang 0037, Lauri Juvela, Junichi Yamagishi, 
Transferring Neural Speech Waveform Synthesizers to Musical Instrument Sounds Generation.

Interspeech2020 Yang Ai, Xin Wang 0037, Junichi Yamagishi, Zhen-Hua Ling, 
Reverberation Modeling for Source-Filter-Based Neural Vocoder.

Interspeech2020 Brij Mohan Lal Srivastava, Natalia A. Tomashenko, Xin Wang 0037, Emmanuel Vincent 0001, Junichi Yamagishi, Mohamed Maouche, Aurélien Bellet, Marc Tommasi, 
Design Choices for X-Vector Based Speaker Anonymization.

Interspeech2020 Natalia A. Tomashenko, Brij Mohan Lal Srivastava, Xin Wang 0037, Emmanuel Vincent 0001, Andreas Nautsch, Junichi Yamagishi, Nicholas W. D. Evans, Jose Patino 0001, Jean-François Bonastre, Paul-Gauthier Noé, Massimiliano Todisco, 
Introducing the VoicePrivacy Initiative.

Interspeech2020 Xin Wang 0037, Junichi Yamagishi, 
Using Cyclic Noise as the Source Signal for Neural Source-Filter-Based Speech Waveform Model.

#80  | Bo Li 0028 | Google Scholar   DBLP
VenuesICASSP: 19Interspeech: 17ICLR: 1TASLP: 1
Years2022: 62021: 72020: 52019: 52018: 72017: 62016: 2
ISCA Sectionneural network acoustic models for asr: 3asr technologies and systems: 2multi-, cross-lingual and other topics in asr: 2search/decoding techniques and confidence measures for asr: 1streaming for asr/rnn transducers: 1speech classification: 1training strategies for asr: 1asr neural network architectures: 1acoustic model adaptation: 1speech and audio segmentation and classification: 1far-field speech recognition: 1far-field speech processing: 1feature extraction and acoustic modeling using neural networks for asr: 1
IEEE Keywordspeech recognition: 18recurrent neural nets: 10natural language processing: 8speech coding: 4rnn t: 3probability: 3speaker recognition: 2end to end asr: 2conformer: 2asr: 2latency: 2voice activity detection: 2automatic speech recognition: 2optimisation: 2end to end speech recognition: 2vocabulary: 2sequence to sequence: 2multilingual: 2channel bank filters: 2speech enhancement: 2rnnt: 1two pass asr: 1long form asr: 1fusion: 1signal representation: 1bilinear pooling: 1gating: 1cascaded encoders: 1hidden markov models: 1confidence scores: 1calibration: 1mean square error methods: 1attention based end to end models: 1transformer: 1confidence: 1regression analysis: 1endpointer: 1multi domain training: 1data augmentation: 1filtering theory: 1unsupervised learning: 1semi supervised training: 1mobile handsets: 1stimulated learning: 1sequence classification: 1connectionist temporal classification: 1speech synthesis: 1end to end speech synthesis: 1lstm: 1feedforward neural nets: 1cnn: 1decoding: 1generative adversarial networks: 1multi dialect: 1adaptation: 1computational linguistics: 1encoder decoder: 1seq2seq: 1indian: 1noise robust speech recognition: 1microphones: 1array signal processing: 1beamforming: 1spatial filters: 1direction of arrival estimation: 1
Most Publications2022: 232021: 132017: 132020: 122019: 11

Affiliations
Google Inc., USA
National University of Singapore, Singapore (former)

ICASSP2022 Tara N. Sainath, Yanzhang He, Arun Narayanan, Rami Botros, Weiran Wang, David Qiu, Chung-Cheng Chiu, Rohit Prabhavalkar, Alexander Gruenstein, Anmol Gulati, Bo Li 0028, David Rybach, Emmanuel Guzman, Ian McGraw, James Qin, Krzysztof Choromanski, Qiao Liang, Robert David, Ruoming Pang, Shuo-Yiin Chang, Trevor Strohman, W. Ronny Huang, Wei Han 0002, Yonghui Wu, Yu Zhang 0033, 
Improving The Latency And Quality Of Cascaded Encoders.

ICASSP2022 Chao Zhang, Bo Li 0028, Zhiyun Lu, Tara N. Sainath, Shuo-Yiin Chang, 
Improving the Fusion of Acoustic and Text Representations in RNN-T.

Interspeech2022 Shuo-Yiin Chang, Bo Li 0028, Tara N. Sainath, Chao Zhang, Trevor Strohman, Qiao Liang, Yanzhang He, 
Turn-Taking Prediction for Natural Conversational Speech.

Interspeech2022 Shuo-Yiin Chang, Guru Prakash, Zelin Wu, Tara N. Sainath, Bo Li 0028, Qiao Liang, Adam Stambler, Shyam Upadhyay, Manaal Faruqui, Trevor Strohman, 
Streaming Intended Query Detection using E2E Modeling for Continued Conversation.

Interspeech2022 Bo Li 0028, Tara N. Sainath, Ruoming Pang, Shuo-Yiin Chang, Qiumin Xu, Trevor Strohman, Vince Chen, Qiao Liang, Heguang Liu, Yanzhang He, Parisa Haghani, Sameer Bidichandani, 
A Language Agnostic Multilingual Streaming On-Device ASR System.

Interspeech2022 Chao Zhang, Bo Li 0028, Tara N. Sainath, Trevor Strohman, Sepand Mavandadi, Shuo-Yiin Chang, Parisa Haghani, 
Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification.

ICASSP2021 Bo Li 0028, Anmol Gulati, Jiahui Yu, Tara N. Sainath, Chung-Cheng Chiu, Arun Narayanan, Shuo-Yiin Chang, Ruoming Pang, Yanzhang He, James Qin, Wei Han 0002, Qiao Liang, Yu Zhang 0033, Trevor Strohman, Yonghui Wu, 
A Better and Faster end-to-end Model for Streaming ASR.

ICASSP2021 Qiujia Li, David Qiu, Yu Zhang 0033, Bo Li 0028, Yanzhang He, Philip C. Woodland, Liangliang Cao, Trevor Strohman, 
Confidence Estimation for Attention-Based Sequence-to-Sequence Models for Speech Recognition.

ICASSP2021 David Qiu, Qiujia Li, Yanzhang He, Yu Zhang 0033, Bo Li 0028, Liangliang Cao, Rohit Prabhavalkar, Deepti Bhatia, Wei Li 0133, Ke Hu, Tara N. Sainath, Ian McGraw, 
Learning Word-Level Confidence for Subword End-To-End ASR.

ICASSP2021 Jiahui Yu, Chung-Cheng Chiu, Bo Li 0028, Shuo-Yiin Chang, Tara N. Sainath, Yanzhang He, Arun Narayanan, Wei Han 0002, Anmol Gulati, Yonghui Wu, Ruoming Pang, 
FastEmit: Low-Latency Streaming ASR with Sequence-Level Emission Regularization.

Interspeech2021 Qiujia Li, Yu Zhang 0033, Bo Li 0028, Liangliang Cao, Philip C. Woodland, 
Residual Energy-Based Models for End-to-End Speech Recognition.

Interspeech2021 Tara N. Sainath, Yanzhang He, Arun Narayanan, Rami Botros, Ruoming Pang, David Rybach, Cyril Allauzen, Ehsan Variani, James Qin, Quoc-Nam Le-The, Shuo-Yiin Chang, Bo Li 0028, Anmol Gulati, Jiahui Yu, Chung-Cheng Chiu, Diamantino Caseiro, Wei Li 0133, Qiao Liang, Pat Rondon, 
An Efficient Streaming Non-Recurrent On-Device End-to-End Model with Improvements to Rare-Word Modeling.

ICLR2021 Jiahui Yu, Wei Han 0002, Anmol Gulati, Chung-Cheng Chiu, Bo Li 0028, Tara N. Sainath, Yonghui Wu, Ruoming Pang, 
Dual-mode ASR: Unify and Improve Streaming ASR with Full-context Modeling.

ICASSP2020 Bo Li 0028, Shuo-Yiin Chang, Tara N. Sainath, Ruoming Pang, Yanzhang He, Trevor Strohman, Yonghui Wu, 
Towards Fast and Accurate Streaming End-To-End ASR.

ICASSP2020 Daniel S. Park, Yu Zhang 0033, Chung-Cheng Chiu, Youzheng Chen, Bo Li 0028, William Chan, Quoc V. Le, Yonghui Wu, 
Specaugment on Large Scale Datasets.

ICASSP2020 Tara N. Sainath, Yanzhang He, Bo Li 0028, Arun Narayanan, Ruoming Pang, Antoine Bruguier, Shuo-Yiin Chang, Wei Li 0133, Raziel Alvarez, Zhifeng Chen, Chung-Cheng Chiu, David Garcia, Alexander Gruenstein, Ke Hu, Anjuli Kannan, Qiao Liang, Ian McGraw, Cal Peyser, Rohit Prabhavalkar, Golan Pundak, David Rybach, Yuan Shangguan, Yash Sheth, Trevor Strohman, Mirkó Visontai, Yonghui Wu, Yu Zhang 0033, Ding Zhao, 
A Streaming On-Device End-To-End Model Surpassing Server-Side Conventional Model Quality and Latency.

Interspeech2020 Shuo-Yiin Chang, Bo Li 0028, David Rybach, Yanzhang He, Wei Li 0133, Tara N. Sainath, Trevor Strohman, 
Low Latency Speech Recognition Using End-to-End Prefetching.

Interspeech2020 Daniel S. Park, Yu Zhang 0033, Ye Jia, Wei Han 0002, Chung-Cheng Chiu, Bo Li 0028, Yonghui Wu, Quoc V. Le, 
Improved Noisy Student Training for Automatic Speech Recognition.

ICASSP2019 Bo Li 0028, Tara N. Sainath, Ruoming Pang, Zelin Wu, 
Semi-supervised Training for End-to-end Models via Weak Distillation.

ICASSP2019 Yanzhang He, Tara N. Sainath, Rohit Prabhavalkar, Ian McGraw, Raziel Alvarez, Ding Zhao, David Rybach, Anjuli Kannan, Yonghui Wu, Ruoming Pang, Qiao Liang, Deepti Bhatia, Yuan Shangguan, Bo Li 0028, Golan Pundak, Khe Chai Sim, Tom Bagby, Shuo-Yiin Chang, Kanishka Rao, Alexander Gruenstein, 
Streaming End-to-end Speech Recognition for Mobile Devices.

#81  | Zhuo Chen 0006 | Google Scholar   DBLP
VenuesInterspeech: 19ICASSP: 17TASLP: 2
Years2022: 92021: 132020: 62019: 22018: 42017: 22016: 2
ISCA Sectionsource separation: 4speaker and language recognition: 1other topics in speech recognition: 1robust asr, and far-field/multi-talker asr: 1single-channel speech enhancement: 1tools, corpora and resources: 1speaker diarization: 1applications in transcription, education and learning: 1multi- and cross-lingual asr, other topics in asr: 1asr neural network architectures: 1noise robust and distant speech recognition: 1multi-channel speech enhancement: 1rich transcription and asr systems: 1distant asr: 1noise reduction: 1source separation and spatial audio: 1
IEEE Keywordspeech recognition: 12speaker recognition: 10source separation: 7continuous speech separation: 6recurrent neural nets: 5speech separation: 5audio signal processing: 4dual path modeling: 2speech enhancement: 2automatic speech recognition: 2speaker counting: 2speaker diarization: 2meeting transcription: 2signal representation: 2transformer: 2probability: 2convolutional neural nets: 2deep clustering: 2memory pool: 1overlap ratio predictor: 1teleconferencing: 1personalized speech enhancement: 1speech intelligibility: 1speaker embedding: 1perceptual speech quality: 1voice activity detection: 1rich transcription: 1dual path rnn: 1multi talker asr: 1transducer: 1long form meeting transcription: 1recurrent selective attention network: 1multi channel microphone: 1deep learning (artificial intelligence): 1conformer: 1multi speaker asr: 1natural language processing: 1bayes methods: 1minimum bayes risk training: 1speaker identification: 1long recording speech separation: 1online processing: 1transforms: 1filtering theory: 1system fusion: 1libricss: 1microphones: 1overlapped speech: 1permutation invariant training: 1time domain: 1recurrent neural networks: 1frequency domain analysis: 1lstm: 1attentive pooling: 1speaker verification: 1cnn: 1array signal processing: 1speaker independent speech separation: 1microphone arrays: 1multi talker: 1signal reconstruction: 1attractor network: 1far field: 1acoustic model: 1spotting: 1data compression: 1teacher student learning: 1speaker invariant training: 1adversariallearning: 1deep neural networks: 1estimation theory: 1music separation: 1approximation theory: 1music: 1pattern clustering: 1singing voice separation: 1optimisation: 1embedding: 1clustering: 1
Most Publications2021: 392022: 252020: 222019: 112018: 11

Affiliations
Microsoft, Redmond, WA, USA
Columbia University, New York, NY, USA (PhD 2017)
URLs

TASLP2022 Chenda Li, Zhuo Chen 0006, Yanmin Qian, 
Dual-Path Modeling With Memory Embedding Model for Continuous Speech Separation.

ICASSP2022 Sefik Emre Eskimez, Takuya Yoshioka, Huaming Wang, Xiaofei Wang 0009, Zhuo Chen 0006, Xuedong Huang 0001, 
Personalized speech enhancement: new models and Comprehensive evaluation.

ICASSP2022 Naoyuki Kanda, Xiong Xiao, Yashesh Gaur, Xiaofei Wang 0009, Zhong Meng, Zhuo Chen 0006, Takuya Yoshioka, 
Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers Using End-to-End Speaker-Attributed ASR.

ICASSP2022 Desh Raj, Liang Lu 0001, Zhuo Chen 0006, Yashesh Gaur, Jinyu Li 0001, 
Continuous Streaming Multi-Talker ASR with Dual-Path Transducers.

ICASSP2022 Yixuan Zhang, Zhuo Chen 0006, Jian Wu 0027, Takuya Yoshioka, Peidong Wang, Zhong Meng, Jinyu Li 0001, 
Continuous Speech Separation with Recurrent Selective Attention Network.

Interspeech2022 Sanyuan Chen, Yu Wu 0012, Chengyi Wang 0002, Shujie Liu 0001, Zhuo Chen 0006, Peidong Wang, Gang Liu, Jinyu Li 0001, Jian Wu 0027, Xiangzhan Yu, Furu Wei, 
Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?

Interspeech2022 Naoyuki Kanda, Jian Wu 0027, Yu Wu 0012, Xiong Xiao, Zhong Meng, Xiaofei Wang 0009, Yashesh Gaur, Zhuo Chen 0006, Jinyu Li 0001, Takuya Yoshioka, 
Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings.

Interspeech2022 Naoyuki Kanda, Jian Wu 0027, Yu Wu 0012, Xiong Xiao, Zhong Meng, Xiaofei Wang 0009, Yashesh Gaur, Zhuo Chen 0006, Jinyu Li 0001, Takuya Yoshioka, 
Streaming Multi-Talker ASR with Token-Level Serialized Output Training.

Interspeech2022 Wangyou Zhang, Zhuo Chen 0006, Naoyuki Kanda, Shujie Liu 0001, Jinyu Li 0001, Sefik Emre Eskimez, Takuya Yoshioka, Xiong Xiao, Zhong Meng, Yanmin Qian, Furu Wei, 
Separating Long-Form Speech with Group-wise Permutation Invariant Training.

ICASSP2021 Sanyuan Chen, Yu Wu 0012, Zhuo Chen 0006, Takuya Yoshioka, Shujie Liu 0001, Jin-Yu Li 0001, Xiangzhan Yu, 
Don't Shoot Butterfly with Rifles: Multi-Channel Continuous Speech Separation with Early Exit Transformer.

ICASSP2021 Sanyuan Chen, Yu Wu 0012, Zhuo Chen 0006, Jian Wu 0027, Jinyu Li 0001, Takuya Yoshioka, Chengyi Wang 0002, Shujie Liu 0001, Ming Zhou 0001, 
Continuous Speech Separation with Conformer.

ICASSP2021 Naoyuki Kanda, Zhong Meng, Liang Lu 0001, Yashesh Gaur, Xiaofei Wang 0009, Zhuo Chen 0006, Takuya Yoshioka, 
Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR.

ICASSP2021 Chenda Li, Zhuo Chen 0006, Yi Luo 0004, Cong Han, Tianyan Zhou, Keisuke Kinoshita, Marc Delcroix, Shinji Watanabe 0001, Yanmin Qian, 
Dual-Path Modeling for Long Recording Speech Separation in Meetings.

ICASSP2021 Xiong Xiao, Naoyuki Kanda, Zhuo Chen 0006, Tianyan Zhou, Takuya Yoshioka, Sanyuan Chen, Yong Zhao 0008, Gang Liu, Yu Wu 0012, Jian Wu 0027, Shujie Liu 0001, Jinyu Li 0001, Yifan Gong 0001, 
Microsoft Speaker Diarization System for the Voxceleb Speaker Recognition Challenge 2020.

Interspeech2021 Sanyuan Chen, Yu Wu 0012, Zhuo Chen 0006, Jian Wu 0027, Takuya Yoshioka, Shujie Liu 0001, Jinyu Li 0001, Xiangzhan Yu, 
Ultra Fast Speech Separation Model with Teacher Student Learning.

Interspeech2021 Sefik Emre Eskimez, Xiaofei Wang 0009, Min Tang, Hemin Yang, Zirun Zhu, Zhuo Chen 0006, Huaming Wang, Takuya Yoshioka, 
Human Listening and Live Captioning: Multi-Task Training for Speech Enhancement.

Interspeech2021 Yihui Fu, Luyao Cheng, Shubo Lv, Yukai Jv, Yuxiang Kong, Zhuo Chen 0006, Yanxin Hu, Lei Xie 0001, Jian Wu 0027, Hui Bu, Xin Xu, Jun Du, Jingdong Chen, 
AISHELL-4: An Open Source Dataset for Speech Enhancement, Separation, Recognition and Speaker Diarization in Conference Scenario.

Interspeech2021 Cong Han, Yi Luo 0004, Chenda Li, Tianyan Zhou, Keisuke Kinoshita, Shinji Watanabe 0001, Marc Delcroix, Hakan Erdogan, John R. Hershey, Nima Mesgarani, Zhuo Chen 0006
Continuous Speech Separation Using Speaker Inventory for Long Recording.

Interspeech2021 Maokui He, Desh Raj, Zili Huang, Jun Du, Zhuo Chen 0006, Shinji Watanabe 0001, 
Target-Speaker Voice Activity Detection with Improved i-Vector Estimation for Unknown Number of Speaker.

Interspeech2021 Naoyuki Kanda, Guoli Ye, Yashesh Gaur, Xiaofei Wang 0009, Zhong Meng, Zhuo Chen 0006, Takuya Yoshioka, 
End-to-End Speaker-Attributed ASR with Transformer.

#82  | Zhengqi Wen | Google Scholar   DBLP
VenuesInterspeech: 25ICASSP: 7TASLP: 6
Years2022: 32021: 82020: 112019: 72018: 52017: 22016: 2
ISCA Sectionspeech synthesis: 6voice conversion and adaptation: 3statistical parametric speech synthesis: 2topics in asr: 1search/decoding techniques and confidence measures for asr: 1computational resource constrained speech recognition: 1multi-channel audio and emotion recognition: 1speech enhancement: 1asr neural network architectures: 1sequence-to-sequence speech recognition: 1spoken term detection, confidence measure, and end-to-end speech recognition: 1speech and audio source separation and scene analysis: 1nn architectures for asr: 1speech synthesis paradigms and methods: 1prosody modeling and generation: 1speech recognition: 1prosody and text processing: 1
IEEE Keywordspeech synthesis: 7speech recognition: 7natural language processing: 4end to end: 4speaker recognition: 4text analysis: 3transfer learning: 3speech coding: 2text based speech editing: 2text editing: 2end to end model: 2decoding: 2attention: 2speaker adaptation: 2low resource: 2waveform generators: 1stochastic processes: 1vocoder: 1filtering theory: 1deterministic plus stochastic: 1multiband excitation: 1noise control: 1vocoders: 1coarse to fine decoding: 1mask prediction: 1text to speech: 1one shot learning: 1mask and prediction: 1cross modal: 1bert: 1non autoregressive: 1fast: 1autoregressive processes: 1language modeling: 1teacher student learning: 1robust end to end speech recognition: 1speech distortion: 1speech enhancement: 1speech transformer: 1gated recurrent fusion: 1decoupled transformer: 1code switching: 1automatic speech recognition: 1bi level decoupling: 1prosody modeling: 1personalized speech synthesis: 1speaking style modeling: 1few shot speaker adaptation: 1prosody and voice factorization: 1the m2voc challenge: 1optimisation: 1prosody transfer: 1optimization strategy: 1audio signal processing: 1adversarial training: 1cross lingual: 1speaker embedding: 1matrix decomposition: 1phoneme representation: 1adversarial multilingual training: 1bottleneck features: 1deep neural networks: 1
Most Publications2020: 222021: 172019: 172016: 152022: 14

Affiliations
URLs

TASLP2022 Tao Wang 0074, Ruibo Fu, Jiangyan Yi, Jianhua Tao, Zhengqi Wen
NeuralDPS: Neural Deterministic Plus Stochastic Model With Multiband Excitation for Noise-Controllable Waveform Generation.

TASLP2022 Tao Wang 0074, Jiangyan Yi, Ruibo Fu, Jianhua Tao, Zhengqi Wen
CampNet: Context-Aware Mask Prediction for End-to-End Text-Based Speech Editing.

ICASSP2022 Tao Wang 0074, Jiangyan Yi, Liqun Deng, Ruibo Fu, Jianhua Tao, Zhengqi Wen
Context-Aware Mask Prediction Network for End-to-End Text-Based Speech Editing.

TASLP2021 Ye Bai, Jiangyan Yi, Jianhua Tao, Zhengkun Tian, Zhengqi Wen, Shuai Zhang 0014, 
Fast End-to-End Speech Recognition Via Non-Autoregressive Models and Cross-Modal Knowledge Transferring From BERT.

TASLP2021 Ye Bai, Jiangyan Yi, Jianhua Tao, Zhengqi Wen, Zhengkun Tian, Shuai Zhang 0014, 
Integrating Knowledge Into End-to-End Speech Recognition From External Text-Only Data.

TASLP2021 Cunhang Fan, Jiangyan Yi, Jianhua Tao, Zhengkun Tian, Bin Liu 0041, Zhengqi Wen
Gated Recurrent Fusion With Joint Training Framework for Robust End-to-End Speech Recognition.

ICASSP2021 Shuai Zhang 0014, Jiangyan Yi, Zhengkun Tian, Ye Bai, Jianhua Tao, Zhengqi Wen
Decoupling Pronunciation and Language for End-to-End Code-Switching Automatic Speech Recognition.

ICASSP2021 Ruibo Fu, Jianhua Tao, Zhengqi Wen, Jiangyan Yi, Tao Wang 0074, Chunyu Qiang, 
Bi-Level Style and Prosody Decoupling Modeling for Personalized End-to-End Speech Synthesis.

ICASSP2021 Tao Wang 0074, Ruibo Fu, Jiangyan Yi, Jianhua Tao, Zhengqi Wen, Chunyu Qiang, Shiming Wang, 
Prosody and Voice Factorization for Few-Shot Speaker Adaptation in the Challenge M2voc 2021.

Interspeech2021 Shuai Zhang 0014, Jiangyan Yi, Zhengkun Tian, Ye Bai, Jianhua Tao, Xuefei Liu, Zhengqi Wen
End-to-End Spelling Correction Conditioned on Acoustic Feature for Code-Switching Speech Recognition.

Interspeech2021 Zhengkun Tian, Jiangyan Yi, Ye Bai, Jianhua Tao, Shuai Zhang 0014, Zhengqi Wen
FSR: Accelerating the Inference Process of Transducer-Based Models by Applying Fast-Skip Regularization.

ICASSP2020 Ruibo Fu, Jianhua Tao, Zhengqi Wen, Jiangyan Yi, Tao Wang 0074, 
Focusing on Attention: Prosody Transfer and Adaptative Optimization Strategy for Multi-Speaker End-to-End Speech Synthesis.

Interspeech2020 Ye Bai, Jiangyan Yi, Jianhua Tao, Zhengkun Tian, Zhengqi Wen, Shuai Zhang 0014, 
Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition.

Interspeech2020 Cunhang Fan, Jianhua Tao, Bin Liu 0041, Jiangyan Yi, Zhengqi Wen
Gated Recurrent Fusion of Spatial and Spectral Features for Multi-Channel Speech Separation with Deep Embedding Representations.

Interspeech2020 Cunhang Fan, Jianhua Tao, Bin Liu 0041, Jiangyan Yi, Zhengqi Wen
Joint Training for Simultaneous Speech Denoising and Dereverberation with Deep Embedding Representations.

Interspeech2020 Ruibo Fu, Jianhua Tao, Zhengqi Wen, Jiangyan Yi, Chunyu Qiang, Tao Wang 0074, 
Dynamic Soft Windowing and Language Dependent Style Token for Code-Switching End-to-End Speech Synthesis.

Interspeech2020 Ruibo Fu, Jianhua Tao, Zhengqi Wen, Jiangyan Yi, Tao Wang 0074, Chunyu Qiang, 
Dynamic Speaker Representations Adjustment and Decoder Factorization for Speaker Adaptation in End-to-End Speech Synthesis.

Interspeech2020 Zheng Lian, Zhengqi Wen, Xinyong Zhou, Songbai Pu, Shengkai Zhang, Jianhua Tao, 
ARVC: An Auto-Regressive Voice Conversion System Without Parallel Training Data.

Interspeech2020 Zhengkun Tian, Jiangyan Yi, Jianhua Tao, Ye Bai, Shuai Zhang 0014, Zhengqi Wen
Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech Recognition.

Interspeech2020 Tao Wang 0074, Xuefei Liu, Jianhua Tao, Jiangyan Yi, Ruibo Fu, Zhengqi Wen
Non-Autoregressive End-to-End TTS with Coarse-to-Fine Decoding.

#83  | Pengyuan Zhang | Google Scholar   DBLP
VenuesInterspeech: 28ICASSP: 5SpeechComm: 2TASLP: 2
Years2023: 12022: 142021: 92020: 32019: 62018: 32017: 1
ISCA Sectionnovel models and training methods for asr: 2speech synthesis: 2speaker embedding and diarization: 1spatial audio: 1speech emotion recognition: 1spoken language processing: 1multi-, cross-lingual and other topics in asr: 1spoofing-aware automatic speaker verification (sasv): 1atypical speech analysis and detection: 1low-resource asr development: 1voice conversion and adaptation: 1single-channel speech enhancement: 1speaker recognition: 1voice anti-spoofing and countermeasure: 1noise robust and distant speech recognition: 1the fearless steps challenge phase-02: 1general topics in speech recognition: 1speech recognition and beyond: 1lexicon and language model for speech recognition: 1asr neural network training: 1asr for noisy and far-field speech: 1model adaptation for asr: 1acoustic scenes and rare events: 1neural network training strategies for asr: 1language modeling: 1noise robust and far-field asr: 1
IEEE Keywordspeech recognition: 7natural language processing: 3text analysis: 3end to end: 2signal classification: 2decoding: 2supervised learning: 1self supervised pre training: 1random processes: 1speech enhancement: 1least squares approximations: 1sensor fusion: 1frequency domain: 1full band and sub band fusion: 1dual path transformer: 1image fusion: 1pre trained language model: 1knowledge transfer: 1connectionist temporal classification: 1non autoregressive: 1end to end speech recognition: 1ctc/attention speech recognition: 1autoregressive processes: 1speech coding: 1pre training: 1grammars: 1transformer: 1unpaired data: 1multi level detection: 1keyword spotting: 1probability: 1home automation: 1vocabulary: 1constrained attention: 1rnn t: 1recurrent neural nets: 1interpretability: 1convolutional neural nets: 1autoregressive moving average processes: 1autoregressive moving average: 1neural language models: 1
Most Publications2022: 452021: 242019: 212020: 152023: 12

Affiliations
URLs

SpeechComm2023 Feng Dang, Hangting Chen, Qi Hu, Pengyuan Zhang, Yonghong Yan 0002, 
First coarse, fine afterward: A lightweight two-stage complex approach for monaural speech enhancement.

TASLP2022 Changfeng Gao, Gaofeng Cheng, Ta Li, Pengyuan Zhang, Yonghong Yan 0002, 
Self-Supervised Pre-Training for Attention-Based Encoder-Decoder ASR Model.

ICASSP2022 Feng Dang, Hangting Chen, Pengyuan Zhang
DPT-FSNet: Dual-Path Transformer Based Full-Band and Sub-Band Fusion Network for Speech Enhancement.

ICASSP2022 Keqi Deng, Songjun Cao, Yike Zhang, Long Ma, Gaofeng Cheng, Ji Xu, Pengyuan Zhang
Improving CTC-Based Speech Recognition Via Knowledge Transferring from Pre-Trained Language Models.

ICASSP2022 Keqi Deng, Zehui Yang, Shinji Watanabe 0001, Yosuke Higuchi, Gaofeng Cheng, Pengyuan Zhang
Improving Non-Autoregressive End-to-End Speech Recognition with Pre-Trained Acoustic and Language Models.

Interspeech2022 Yifan Chen, Yifan Guo, Qingxuan Li, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan 0002, 
Interrelate Training and Searching: A Unified Online Clustering Framework for Speaker Diarization.

Interspeech2022 Hangting Chen, Yi Yang, Feng Dang, Pengyuan Zhang
Beam-Guided TasNet: An Iterative Speech Separation Framework with Multi-Channel Output.

Interspeech2022 Chengxin Chen, Pengyuan Zhang
CTA-RNN: Channel and Temporal-wise Attention RNN leveraging Pre-trained ASR Embeddings for Speech Emotion Recognition.

Interspeech2022 Yukun Liu, Ta Li, Pengyuan Zhang, Yonghong Yan 0002, 
NAS-SCAE: Searching Compact Attention-based Encoders For End-to-end Automatic Speech Recognition.

Interspeech2022 Zehui Yang, Yifan Chen, Lei Luo, Runyan Yang, Lingxuan Ye, Gaofeng Cheng, Ji Xu, Yaohui Jin, Qingqing Zhang, Pengyuan Zhang, Lei Xie, Yonghong Yan 0002, 
Open Source MagicData-RAMC: A Rich Annotated Mandarin Conversational(RAMC) Speech Dataset.

Interspeech2022 Lingxuan Ye, Gaofeng Cheng, Runyan Yang, Zehui Yang, Sanli Tian, Pengyuan Zhang, Yonghong Yan 0002, 
Improving Recognition of Out-of-vocabulary Words in E2E Code-switching ASR by Fusing Speech Generation Methods.

Interspeech2022 Yuxiang Zhang, Zhuo Li, Wenchao Wang, Pengyuan Zhang
SASV Based on Pre-trained ASV System and Integrated Scoring Module.

Interspeech2022 Xueshuai Zhang, Jiakun Shen, Jun Zhou, Pengyuan Zhang, Yonghong Yan 0002, Zhihua Huang, Yanfen Tang, Yu Wang, Fujie Zhang, Shaoxing Zhang, Aijun Sun, 
Robust Cough Feature Extraction and Classification Method for COVID-19 Cough Detection Based on Vocalization Characteristics.

Interspeech2022 Han Zhu, Li Wang, Gaofeng Cheng, Jindong Wang 0001, Pengyuan Zhang, Yonghong Yan 0002, 
Wav2vec-S: Semi-Supervised Pre-Training for Low-Resource ASR.

Interspeech2022 Han Zhu, Jindong Wang 0001, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan 0002, 
Decoupled Federated Learning for ASR with Non-IID Data.

SpeechComm2021 Danyang Liu, Ji Xu, Pengyuan Zhang, Yonghong Yan 0002, 
A unified system for multilingual speech recognition and language identification.

ICASSP2021 Changfeng Gao, Gaofeng Cheng, Runyan Yang, Han Zhu, Pengyuan Zhang, Yonghong Yan 0002, 
Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Text Data.

ICASSP2021 Zuozhen Liu, Ta Li, Pengyuan Zhang
RNN-T Based Open-Vocabulary Keyword Spotting in Mandarin with Multi-Level Detection.

Interspeech2021 Ziyi Chen, Pengyuan Zhang
TVQVC: Transformer Based Vector Quantized Variational Autoencoder with CTC Loss for Voice Conversion.

Interspeech2021 Feng Dang, Pengyuan Zhang, Hangting Chen, 
Improved Speech Enhancement Using a Complex-Domain GAN with Fused Time-Domain and Time-Frequency Domain Constraints.

#84  | Samuel Thomas 0001 | Google Scholar   DBLP
VenuesInterspeech: 21ICASSP: 15TASLP: 1
Years2022: 82021: 82020: 62019: 42018: 32017: 32016: 5
ISCA Sectionspoken language understanding: 3multi-, cross-lingual and other topics in asr: 1other topics in speech recognition: 1spoken language modeling and understanding: 1multi- and cross-lingual asr, other topics in asr: 1multimodal systems: 1low-resource speech recognition: 1multilingual and code-switched asr: 1asr neural network architectures: 1multimodal speech processing: 1model adaptation for asr: 1rich transcription and asr systems: 1adjusting to speaker, accent, and domain: 1neural network training strategies for asr: 1neural network acoustic models for asr: 1conversational telephone speech recognition: 1far-field, robustness and adaptation: 1acoustic model adaptation: 1low resource speech recognition: 1
IEEE Keywordspeech recognition: 10natural language processing: 7recurrent neural nets: 6spoken language understanding: 6automatic speech recognition: 5text analysis: 4speaker recognition: 3data handling: 2end to end systems: 2rnn transducers: 2software agents: 1virtual reality: 1intent classification: 1weakly supervised learning: 1text classification: 1nearest neighbour methods: 1nearest neighbors: 1voice conversations: 1speech coding: 1decoding: 1encoder decoder: 1atis: 1transducers: 1attention: 1spoken dialog system: 1interactive systems: 1natural languages: 1end to end models: 1end to end mod els: 1language model customization: 1adaptation: 1speaker adaptation: 1speaker change detection: 1affine transforms: 1speaker segmentation: 1signal detection: 1data analysis: 1end to end systems: 1transformer networks: 1self supervised pre training: 1speech to intent: 1synthetic speech augmentation: 1pre trained text embedding: 1image sequences: 1audio visual speech processing: 1image restoration: 1video signal processing: 1image texture: 1u net: 1image inpainting: 1n gram: 1rnnlm: 1vocabulary: 1tem plate: 1interpolation: 1subword: 1broadcast news: 1deep neural networks.: 1multi task learning: 1acoustic modeling: 1multi accent speech recognition: 1─ end to end models: 1cnn: 1transforms: 1joint training: 1neural network: 1denoising autoencoder: 1channel bank filters: 1feedforward neural nets: 1signal denoising: 1diarization: 1event detection: 1long short term memory: 1music detection: 1acoustic convolution: 1acoustic noise: 1matrix decomposition: 1robust speech recognition: 1acoustic features: 1dictionary learning: 1non negative matrix factorization: 1
Most Publications2022: 172021: 172020: 112019: 102010: 9

Affiliations
IBM Research AI, Thomas J. Watson Research Center, NY, USA
Johns Hopkins University, USA (former)

ICASSP2022 Zvi Kons, Aharon Satt, Hong-Kwang Kuo, Samuel Thomas 0001, Boaz Carmeli, Ron Hoory, Brian Kingsbury, 
A New Data Augmentation Method for Intent Classification Enhancement and its Application on Spoken Conversation Datasets.

ICASSP2022 Hong-Kwang Jeff Kuo, Zoltán Tüske, Samuel Thomas 0001, Brian Kingsbury, George Saon, 
Improving End-to-end Models for Set Prediction in Spoken Language Understanding.

ICASSP2022 Vishal Sunder, Samuel Thomas 0001, Hong-Kwang Jeff Kuo, Jatin Ganhotra, Brian Kingsbury, Eric Fosler-Lussier, 
Towards End-to-End Integration of Dialog History for Improved Spoken Language Understanding.

ICASSP2022 Samuel Thomas 0001, Hong-Kwang Jeff Kuo, Brian Kingsbury, George Saon, 
Towards Reducing the Need for Speech Training Data to Build Spoken Language Understanding Systems.

ICASSP2022 Samuel Thomas 0001, Brian Kingsbury, George Saon, Hong-Kwang Jeff Kuo, 
Integrating Text Inputs for Training and Adapting RNN Transducer ASR Models.

Interspeech2022 Takashi Fukuda, Samuel Thomas 0001, Masayuki Suzuki, Gakuto Kurata, George Saon, Brian Kingsbury, 
Global RNN Transducer Models For Multi-dialect Speech Recognition.

Interspeech2022 Zvi Kons, Hagai Aronowitz, Edmilson da Silva Morais, Matheus Damasceno, Hong-Kwang Kuo, Samuel Thomas 0001, George Saon, 
Extending RNN-T-based speech recognition systems with emotion and language classification.

Interspeech2022 Vishal Sunder, Eric Fosler-Lussier, Samuel Thomas 0001, Hong-Kwang Kuo, Brian Kingsbury, 
Tokenwise Contrastive Pretraining for Finer Speech-to-BERT Alignment in End-to-End Speech-to-Intent Systems.

TASLP2021 Leda Sari, Mark Hasegawa-Johnson, Samuel Thomas 0001
Auxiliary Networks for Joint Speaker Adaptation and Speaker Change Detection.

ICASSP2021 Samuel Thomas 0001, Hong-Kwang Jeff Kuo, George Saon, Zoltán Tüske, Brian Kingsbury, Gakuto Kurata, Zvi Kons, Ron Hoory, 
RNN Transducer Models for Spoken Language Understanding.

ICASSP2021 Edmilson da Silva Morais, Hong-Kwang Jeff Kuo, Samuel Thomas 0001, Zoltán Tüske, Brian Kingsbury, 
End-to-End Spoken Language Understanding Using Transformer Networks and Self-Supervised Pre-Trained Features.

Interspeech2021 Sujeong Cha, Wangrui Hou, Hyun Jung, My Phung, Michael Picheny, Hong-Kwang Jeff Kuo, Samuel Thomas 0001, Edmilson da Silva Morais, 
Speak or Chat with Me: End-to-End Spoken Language Understanding System with Flexible Inputs.

Interspeech2021 Takashi Fukuda, Samuel Thomas 0001
Knowledge Distillation Based Training of Universal ASR Source Models for Cross-Lingual Transfer.

Interspeech2021 Jatin Ganhotra, Samuel Thomas 0001, Hong-Kwang Jeff Kuo, Sachindra Joshi, George Saon, Zoltán Tüske, Brian Kingsbury, 
Integrating Dialog History into End-to-End Spoken Language Understanding Systems.

Interspeech2021 Andrew Rouditchenko, Angie W. Boggust, David Harwath, Samuel Thomas 0001, Hilde Kuehne, Brian Chen, Rameswar Panda, Rogério Feris, Brian Kingsbury, Michael Picheny, James R. Glass, 
Cascaded Multilingual Audio-Visual Learning from Videos.

Interspeech2021 Andrew Rouditchenko, Angie W. Boggust, David Harwath, Brian Chen, Dhiraj Joshi, Samuel Thomas 0001, Kartik Audhkhasi, Hilde Kuehne, Rameswar Panda, Rogério Schmidt Feris, Brian Kingsbury, Michael Picheny, Antonio Torralba 0001, James R. Glass, 
AVLnet: Learning Audio-Visual Language Representations from Instructional Videos.

ICASSP2020 Yinghui Huang, Hong-Kwang Kuo, Samuel Thomas 0001, Zvi Kons, Kartik Audhkhasi, Brian Kingsbury, Ron Hoory, Michael Picheny, 
Leveraging Unpaired Text Data for Training End-To-End Speech-to-Intent Systems.

ICASSP2020 Alexandros Koumparoulis, Gerasimos Potamianos, Samuel Thomas 0001, Edmilson da Silva Morais, 
Audio-Assisted Image Inpainting for Talking Faces.

Interspeech2020 Samuel Thomas 0001, Kartik Audhkhasi, Brian Kingsbury, 
Transliteration Based Data Augmentation for Training Multilingual ASR Acoustic Models in Low Resource Settings.

Interspeech2020 Takashi Fukuda, Samuel Thomas 0001
Implicit Transfer of Privileged Acoustic Information in a Generalized Knowledge Distillation Framework.

#85  | Jon Barker | Google Scholar   DBLP
VenuesInterspeech: 23ICASSP: 10SpeechComm: 2TASLP: 1
Years2022: 92021: 72020: 52019: 22018: 52017: 42016: 4
ISCA Sectionspeech intelligibility prediction for hearing-impaired listeners: 3source separation: 2multi-channel speech enhancement and hearing aids: 2speech analysis and representation: 2special session: 2technology for disordered speech: 1novel models and training methods for asr: 1assessment of pathological speech and language: 1noise robust and distant speech recognition: 1speech in health: 1applications of language technologies: 1robust speech recognition: 1spatial and phase cues for source separation and speech recognition: 1noise robust speech recognition: 1speech-enhancement: 1far-field, robustness and adaptation: 1far-field speech processing: 1
IEEE Keywordspeech recognition: 8source separation: 3handicapped aids: 3filtering theory: 2natural language processing: 2data augmentation: 2time domain analysis: 2speaker recognition: 2gaussian processes: 2multi stream acoustic modelling: 1speech coding: 1dysarthric automatic speech recognition: 1source filter separation and fusion: 1microphones: 1audio signal processing: 1data simulation: 1automatic speech recognition: 1deep neural network: 1auditory model: 1voice source: 1sung speech: 1optimisation: 1hearing: 1medical signal processing: 1speech intelligibility: 1hearing aid speech processing: 1backpropagation: 1hearing aids: 1intelligibility objective: 1differentiable framework: 1reverberation: 1multi channel source separation: 1multi speaker extraction: 1noise: 1object recognition: 1probability: 1dysarthric speech recognition: 1transfer learning: 1entropy: 1gaussian distribution: 1data selection: 1posterior probability: 1vocabulary: 1spectral analysis: 1language modelling: 1out of domain data: 1continuous dysarthric speech recognition: 1multi channel: 1multi speaker asr: 1end to end: 1convolutional neural nets: 1tasnet: 1speech separation: 1phonetics: 1personalised speech recognition: 1speech tempo: 1dysarthria: 1hidden markov models: 1mixture models: 1statistical normalisation: 1phase distribution: 1phase spectrum: 1signal representation: 1robust speech recognition: 1
Most Publications2022: 162021: 122017: 122018: 112015: 10

Affiliations
URLs

TASLP2022 Zhengjun Yue, Erfan Loweimi, Heidi Christensen, Jon Barker, Zoran Cvetkovic, 
Acoustic Modelling From Raw Source and Filter Components for Dysarthric Speech Recognition.

ICASSP2022 Jack Deadman, Jon Barker
Improved Simulation of Realistically-Spatialised Simultaneous Speech Using Multi-Camera Analysis in The Chime-5 Dataset.

ICASSP2022 Zehai Tu, Jack Deadman, Ning Ma 0002, Jon Barker
Auditory-Based Data Augmentation for end-to-end Automatic Speech Recognition.

Interspeech2022 Jon Barker, Michael Akeroyd, Trevor J. Cox, John F. Culling, Jennifer Firth, Simone Graetzer, Holly Griffiths, Lara Harris, Graham Naylor, Zuzanna Podwinska, Eszter Porter, Rhoddy Viveros Muñoz, 
The 1st Clarity Prediction Challenge: A machine learning challenge for hearing aid intelligibility prediction.

Interspeech2022 Jack Deadman, Jon Barker
Modelling Turn-taking in Multispeaker Parties for Realistic Data Simulation.

Interspeech2022 Zehai Tu, Ning Ma 0002, Jon Barker
Exploiting Hidden Representations from a DNN-based Speech Recogniser for Speech Intelligibility Prediction in Hearing-impaired Listeners.

Interspeech2022 Zehai Tu, Ning Ma 0002, Jon Barker
Unsupervised Uncertainty Measures of Automatic Speech Recognition for Non-intrusive Speech Intelligibility Prediction.

Interspeech2022 Zhengjun Yue, Erfan Loweimi, Heidi Christensen, Jon Barker, Zoran Cvetkovic, 
Dysarthric Speech Recognition From Raw Waveform with Parametric CNNs.

Interspeech2022 Jisi Zhang, Catalin Zorila, Rama Doddipatla, Jon Barker
On monoaural speech enhancement for automatic recognition of real noisy speech using mixture invariant training.

ICASSP2021 Gerardo Roa Dabike, Jon Barker
The use of Voice Source Features for Sung Speech Recognition.

ICASSP2021 Zehai Tu, Ning Ma 0002, Jon Barker
DHASP: Differentiable Hearing Aid Speech Processing.

ICASSP2021 Jisi Zhang, Catalin Zorila, Rama Doddipatla, Jon Barker
Time-Domain Speech Extraction with Spatial Information and Multi Speaker Conditioning Mechanism.

Interspeech2021 Simone Graetzer, Jon Barker, Trevor J. Cox, Michael Akeroyd, John F. Culling, Graham Naylor, Eszter Porter, Rhoddy Viveros Muñoz, 
Clarity-2021 Challenges: Machine Learning Challenges for Advancing Hearing Aid Processing.

Interspeech2021 Zehai Tu, Ning Ma 0002, Jon Barker
Optimising Hearing Aid Fittings for Speech in Noise with a Differentiable Hearing Loss Model.

Interspeech2021 Zhengjun Yue, Jon Barker, Heidi Christensen, Cristina McKean, Elaine Ashton, Yvonne Wren, Swapnil Gadgil, Rebecca Bright, 
Parental Spoken Scaffolding and Narrative Skills in Crowd-Sourced Storytelling Samples of Young Children.

Interspeech2021 Jisi Zhang, Catalin Zorila, Rama Doddipatla, Jon Barker
Teacher-Student MixIT for Unsupervised and Semi-Supervised Speech Separation.

ICASSP2020 Feifei Xiong, Jon Barker, Zhengjun Yue, Heidi Christensen, 
Source Domain Data Selection for Improved Transfer Learning Targeting Dysarthric Speech Recognition.

ICASSP2020 Zhengjun Yue, Feifei Xiong, Heidi Christensen, Jon Barker
Exploring Appropriate Acoustic and Language Modelling Choices for Continuous Dysarthric Speech Recognition.

ICASSP2020 Jisi Zhang, Catalin Zorila, Rama Doddipatla, Jon Barker
On End-to-end Multi-channel Time Domain Speech Separation in Reverberant Environments.

Interspeech2020 Jack Deadman, Jon Barker
Simulating Realistically-Spatialised Simultaneous Speech Using Video-Driven Speaker Detection and the CHiME-5 Dataset.

#86  | Thomas Hain | Google Scholar   DBLP
VenuesInterspeech: 30ICASSP: 6
Years2022: 52021: 32020: 82019: 32018: 22017: 52016: 10
ISCA Sectionspeech emotion recognition: 2speech analysis and representation: 2multimodal processing: 2far-field, robustness and adaptation: 2speech intelligibility prediction for hearing-impaired listeners: 1asr: 1low-resource asr development: 1search/decoding techniques and confidence measures for asr: 1speech and audio quality assessment: 1the zero resource speech challenge 2020: 1multilingual and code-switched asr: 1speaker recognition: 1learning techniques for speaker recognition: 1computational paralinguistics: 1speech recognition and beyond: 1applications of language technologies: 1network architectures for emotion and paralinguistics recognition: 1applications in education and learning: 1language models for asr: 1noise robust speech recognition: 1language modeling for conversational speech and confidence measures: 1dialogue systems and analysis of dialogue: 1language model adaptation: 1special session: 1speaker diarization and recognition: 1language recognition: 1
IEEE Keywordspeech recognition: 3gaussian processes: 2speaker recognition: 2filtering theory: 2submodular: 1natural language processing: 1unsupervised: 1hidden markov models: 1data selection: 1contrastive loss: 1perception bias: 1pronunciation assessment: 1l2 learning: 1voice activity detection: 1self attention: 1gaussian noise: 1convolutional neural nets: 1audio anomaly classification: 1audio recording: 1temporal convolutional network: 1distortion: 1audio signal processing: 1signal denoising: 1multiple hypothesis: 1signal classification: 1back propagation: 1semi supervised adaptation: 1connectionist temporal classification: 1end to end speech recognition: 1statistical normalisation: 1phase distribution: 1phase spectrum: 1signal representation: 1robust speech recognition: 1speaker channels: 1multi channel: 1microphones: 1crosstalk: 1deep neural networks: 1speaker diarisation: 1
Most Publications2015: 222022: 212020: 202016: 182021: 13

Affiliations
University of Sheffield, England, UK

ICASSP2022 Chanho Park, Rehan Ahmad, Thomas Hain
Unsupervised Data Selection for Speech Recognition with Contrastive Loss Ratios.

ICASSP2022 Jose Antonio Lopez Saenz, Thomas Hain
A Model for Assessor Bias in Automatic Pronunciation Assessment.

Interspeech2022 George Close, Samuel Hollands, Stefan Goetze, Thomas Hain
Non-intrusive Speech Intelligibility Metric Prediction for Hearing Impaired Individuals.

Interspeech2022 Muhammad Umar Farooq, Thomas Hain
Investigating the Impact of Crosslingual Acoustic-Phonetic Similarities on Multilingual Speech Recognition.

Interspeech2022 Muhammad Umar Farooq, Darshan Adiga Haniya Narayana, Thomas Hain
Non-Linear Pairwise Language Mappings for Low-Resource Multilingual Acoustic Model Fusion.

ICASSP2021 Qiang Huang 0008, Thomas Hain
Improving Audio Anomalies Recognition Using Temporal Convolutional Attention Networks.

ICASSP2021 Cong-Thanh Do, Rama Doddipatla, Thomas Hain
Multiple-Hypothesis CTC-Based Semi-Supervised Adaptation of End-to-End Speech Recognition.

Interspeech2021 Anna Ollerenshaw, Md. Asif Jalal, Thomas Hain
Insights on Neural Representations for End-to-End Speech Recognition.

Interspeech2020 Qiang Huang 0008, Thomas Hain
Exploration of Audio Quality Assessment and Anomaly Localisation Using Attention Models.

Interspeech2020 Mingjie Chen, Thomas Hain
Unsupervised Acoustic Unit Representation Learning for Voice Conversion Using WaveNet Auto-Encoders.

Interspeech2020 Md. Asif Jalal, Rosanna Milner, Thomas Hain
Empirical Interpretation of Speech Emotion Perception with Attention Based Model for Speech Emotion Recognition.

Interspeech2020 Md Asif Jalal, Rosanna Milner, Thomas Hain, Roger K. Moore, 
Removing Bias with Residual Mixture of Multi-View Attention for Speech Emotion Recognition.

Interspeech2020 Hardik B. Sailor, Thomas Hain
Multilingual Speech Recognition Using Language-Specific Phoneme Recognition as Auxiliary Task for Indian Languages.

Interspeech2020 Yanpei Shi, Qiang Huang 0008, Thomas Hain
Speaker Re-Identification with Speaker Dependent Speech Enhancement.

Interspeech2020 Yanpei Shi, Qiang Huang 0008, Thomas Hain
Weakly Supervised Training of Hierarchical Attention Networks for Speaker Identification.

Interspeech2020 Lukas Stappen, Georgios Rizos, Madina Hasan, Thomas Hain, Björn W. Schuller, 
Uncertainty-Aware Machine Support for Paper Reviewing on the Interspeech 2019 Submission Corpus.

Interspeech2019 Mortaza Doulaty, Thomas Hain
Latent Dirichlet Allocation Based Acoustic Data Selection for Automatic Speech Recognition.

Interspeech2019 Qiang Huang 0008, Thomas Hain
Detecting Mismatch Between Speech and Transcription Using Cross-Modal Attention.

Interspeech2019 Md Asif Jalal, Erfan Loweimi, Roger K. Moore, Thomas Hain
Learning Temporal Clusters Using Capsule Routing for Speech Emotion Recognition.

Interspeech2018 Erfan Loweimi, Jon Barker, Thomas Hain
On the Usefulness of the Speech Phase Spectrum for Pitch Extraction.

#87  | Tomi Kinnunen | Google Scholar   DBLP
VenuesInterspeech: 20ICASSP: 7TASLP: 5SpeechComm: 4
Years2022: 42021: 22020: 52019: 62018: 42017: 92016: 6
ISCA Sectionthe attacker’s perpective on automatic speaker verification: 2speaker recognition evaluation: 2special session: 2robust speaker recognition and anti-spoofing: 2spoofing-aware automatic speaker verification (sasv): 1speech coding and privacy: 1voice anti-spoofing and countermeasure: 1speaker recognition: 1speaker embedding: 1the 2019 automatic speaker verification spoofing and countermeasures challenge: 1speaker recognition and diarization: 1deep learning for source separation and pitch tracking: 1speaker verification: 1speaker database and anti-spoofing: 1short utterances speaker recognition: 1speech and audio segmentation and classification: 1
IEEE Keywordspeaker recognition: 9speaker verification: 4probability: 3security of data: 2spoofing: 2discriminative training: 2speech recognition: 2security: 1reinforcement learning: 1spoof countermeasures: 1nonlinear compression: 1multi regime compression: 1data compression: 1deep learning (artificial intelligence): 1spoofing counter measures: 1automatic speaker verification (asv): 1presentation attack detection: 1detect ion cost function: 1regression analysis: 1fundamental frequency: 1recurrent neural nets: 1frequency estimation: 1signal classification: 1pitch: 1recurrent neural networks: 1f0: 1waveform to sinusoid regression: 1regression model: 1mimicry: 1large scale speaker identification: 1web service: 1speaker ranking: 1social networking (online): 1public demo: 1voxceleb: 1optimisation: 1factor analysis: 1natural language processing: 1language identification: 1plda: 1language detection: 1gender classification: 1gender dependent system: 1i vector: 1voice conversion: 1non parallel training: 1replay: 1english corpus: 1i vector system: 1attribute detectors: 1signal representation: 1natural languages: 1finnish corpus: 1linguistics: 1multi task learning: 1probabilistic linear discriminant analysis: 1
Most Publications2018: 252020: 242021: 222022: 212013: 18

Affiliations
URLs

SpeechComm2022 Lauri Tavi, Tomi Kinnunen, Rosa González Hautamäki, 
Improving speaker de-identification with functional data analysis of f0 trajectories.

TASLP2022 Anssi Kanervisto, Ville Hautamäki, Tomi Kinnunen, Junichi Yamagishi, 
Optimizing Tandem Speaker Verification and Anti-Spoofing Systems.

ICASSP2022 Xuechen Liu, Md. Sahidullah, Tomi Kinnunen
Learnable Nonlinear Compression for Robust Speaker Verification.

Interspeech2022 Jee-weon Jung, Hemlata Tak, Hye-jin Shim, Hee-Soo Heo, Bong-Jin Lee, Soo-Whan Chung, Ha-Jin Yu, Nicholas W. D. Evans, Tomi Kinnunen
SASV 2022: The First Spoofing-Aware Speaker Verification Challenge.

Interspeech2021 Bhusan Chettri, Rosa González Hautamäki, Md. Sahidullah, Tomi Kinnunen
Data Quality as Predictor of Voice Anti-Spoofing Generalization.

Interspeech2021 Tomi Kinnunen, Andreas Nautsch, Md. Sahidullah, Nicholas W. D. Evans, Xin Wang 0037, Massimiliano Todisco, Héctor Delgado, Junichi Yamagishi, Kong Aik Lee, 
Visualizing Classifier Adjacency Relations: A Case Study in Speaker Verification and Voice Anti-Spoofing.

TASLP2020 Tomi Kinnunen, Héctor Delgado, Nicholas W. D. Evans, Kong Aik Lee, Ville Vestman, Andreas Nautsch, Massimiliano Todisco, Xin Wang 0037, Md. Sahidullah, Junichi Yamagishi, Douglas A. Reynolds, 
Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification: Fundamentals.

Interspeech2020 Rohan Kumar Das, Xiaohai Tian, Tomi Kinnunen, Haizhou Li 0001, 
The Attacker's Perspective on Automatic Speaker Verification: An Overview.

Interspeech2020 Rosa González Hautamäki, Tomi Kinnunen
Why Did the x-Vector System Miss a Target Speaker? Impact of Acoustic Mismatch Upon Target Score on VoxCeleb Data.

Interspeech2020 Xuechen Liu, Md. Sahidullah, Tomi Kinnunen
A Comparative Re-Assessment of Feature Extractors for Deep Speaker Embeddings.

Interspeech2020 Alexey Sholokhov, Tomi Kinnunen, Ville Vestman, Kong Aik Lee, 
Extrapolating False Alarm Rates in Automatic Speaker Verification.

TASLP2019 Akihiro Kato, Tomi H. Kinnunen
Statistical Regression Models for Noise Robust F0 Estimation Using Recurrent Deep Neural Networks.

ICASSP2019 Tomi Kinnunen, Rosa González Hautamäki, Ville Vestman, Md. Sahidullah, 
Can We Use Speaker Recognition Technology to Attack Itself? Enhancing Mimicry Attacks Using Automatic Target Speaker Selection.

ICASSP2019 Ville Vestman, Bilal Soomro, Anssi Kanervisto, Ville Hautamäki, Tomi Kinnunen
Who Do I Sound like? Showcasing Speaker Recognition Technology by Youtube Voice Search.

Interspeech2019 Kong Aik Lee, Ville Hautamäki, Tomi H. Kinnunen, Hitoshi Yamamoto, Koji Okabe, Ville Vestman, Jing Huang 0019, Guohong Ding, Hanwu Sun, Anthony Larcher, Rohan Kumar Das, Haizhou Li 0001, Mickael Rouvier, Pierre-Michel Bousquet, Wei Rao, Qing Wang 0039, Chunlei Zhang, Fahimeh Bahmaninezhad, Héctor Delgado, Massimiliano Todisco, 
I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences.

Interspeech2019 Massimiliano Todisco, Xin Wang 0037, Ville Vestman, Md. Sahidullah, Héctor Delgado, Andreas Nautsch, Junichi Yamagishi, Nicholas W. D. Evans, Tomi H. Kinnunen, Kong Aik Lee, 
ASVspoof 2019: Future Horizons in Spoofed and Fake Audio Detection.

Interspeech2019 Ville Vestman, Kong Aik Lee, Tomi H. Kinnunen, Takafumi Koshinaka, 
Unleashing the Unused Potential of i-Vectors Enabled by GPU Acceleration.

SpeechComm2018 Rosa González Hautamäki, Md. Sahidullah, Ville Hautamäki, Tomi Kinnunen
Acoustical and perceptual study of voice disguise by age modification in speaker verification.

SpeechComm2018 Ville Vestman, Dhananjaya N. Gowda, Md. Sahidullah, Paavo Alku, Tomi Kinnunen
Speaker recognition from whispered speech: A tutorial survey and an application of time-varying linear prediction.

Interspeech2018 Akihiro Kato, Tomi Kinnunen
Waveform to Single Sinusoid Regression to Estimate the F0 Contour from Noisy Speech Using Recurrent Deep Neural Networks.

#88  | Nancy F. Chen | Google Scholar   DBLP
VenuesInterspeech: 19ICASSP: 8TASLP: 5SpeechComm: 2ACL: 1EMNLP: 1
Years2022: 42021: 42020: 32019: 32018: 62017: 62016: 10
ISCA Sectionspecial session: 3applications in transcription, education and learning: 2speech recognition: 2speech synthesis: 1spoken language processing: 1show and tell: 1pronunciation: 1accoustic phonetics of l1-l2 and other interactions: 1speech annotation and speech assessment: 1extracting information from audio: 1multi-lingual models and adaptation for asr: 1resources and annotation of resources: 1low resource speech recognition: 1speech and hearing disorders & perception: 1spoken term detection: 1
IEEE Keywordspeech recognition: 10natural language processing: 8keyword spotting: 4computer assisted pronunciation training (capt): 3under resourced languages: 3probability: 2automatic speech recognition (asr): 2speech coding: 2incremental learning: 1continual learning: 1chinese: 1domain adaptation: 1adaptive filters: 1hierarchical character embeddings: 1spell check: 1text analysis: 1unsupervised feature adaptation: 1unsupervised learning: 1child speech recognition: 1automatic speech evaluation: 1non native tone modeling and mispronunciation detection: 1pattern classification: 1computer assisted language learning (call): 1language translation: 1transliteration: 1cross lingual information retrieval: 1machine translation: 1named entity recognition: 1linguistics: 1multi task learning: 1gaussian processes: 1phone recognition: 1mismatched transcription: 1hidden markov models: 1probabilistic transcription: 1mismatched machine transcription: 1transfer learning: 1crowdsourcing: 1zero resourced languages: 1modular system: 1tone recognition and mispronunciation detection: 1computer assistant language learning (call): 1electroencephalography: 1medical signal processing: 1automatic speech recognition: 1eeg: 1mismatched crowdsourcing: 1recurrent neural network: 1recurrent neural nets: 1multilingual data selection: 1language identification: 1estimation theory: 1deep neural network (dnn): 1large vocabulary continuous speech recognition (lvcsr): 1spoken term detection (std): 1extended recognition network (ern): 1decoding: 1error correction: 1dynamic time warping (dtw): 1computational linguistics: 1spoken term detection: 1submodular optimization: 1active learning: 1
Most Publications2022: 282021: 232016: 202020: 182019: 16

Affiliations
URLs

ICASSP2022 Yizheng Huang, Nana Hou, Nancy F. Chen
Progressive Continual Learning for Spoken Keyword Spotting.

Interspeech2022 Perry Lam, Huayun Zhang, Nancy F. Chen, Berrak Sisman, 
EPIC TTS Models: Empirical Pruning Investigations Characterizing Text-To-Speech Models.

Interspeech2022 Zhengyuan Liu, Nancy F. Chen
Dynamic Sliding Window Modeling for Abstractive Meeting Summarization.

Interspeech2022 Jeremy Heng Meng Wong, Huayun Zhang, Nancy F. Chen
Variations of multi-task learning for spoken language assessment.

TASLP2021 Minh Nguyen 0002, Gia H. Ngo, Nancy F. Chen
Domain-Shift Conditioning Using Adaptable Filtering Via Hierarchical Embeddings for Robust Chinese Spell Check.

ICASSP2021 Richeng Duan, Nancy F. Chen
Senone-Aware Adversarial Multi-Task Training for Unsupervised Child to Adult Speech Adaptation.

Interspeech2021 Ke Shi 0001, Kye Min Tan, Huayun Zhang, Siti Umairah Md. Salleh, Shikang Ni, Nancy F. Chen
WittyKiddy: Multilingual Spoken Language Learning for Kids.

Interspeech2021 Huayun Zhang, Ke Shi 0001, Nancy F. Chen
Multilingual Speech Evaluation: Case Studies on English, Malay and Tamil.

Interspeech2020 Richeng Duan, Nancy F. Chen
Unsupervised Feature Adaptation Using Adversarial Multi-Task Training for Automatic Evaluation of Children's Speech.

Interspeech2020 Yuling Gu, Nancy F. Chen
Characterization of Singaporean Children's English: Comparisons to American and British Counterparts Using Archetypal Analysis.

Interspeech2020 Ke Shi 0001, Kye Min Tan, Richeng Duan, Siti Umairah Md. Salleh, Nur Farah Ain Suhaimi, Rajan Vellu, Ngoc Thuy Huong Helen Thai, Nancy F. Chen
Computer-Assisted Language Learning System: Automatic Speech Evaluation for Children Learning Malay and Tamil.

TASLP2019 Wei Li 0119, Nancy F. Chen, Sabato Marco Siniscalchi, Chin-Hui Lee, 
Improving Mispronunciation Detection of Mandarin Tones for Non-Native Learners With Soft-Target Tone Labels and BLSTM-Based Deep Tone Models.

TASLP2019 Hoang Gia Ngo, Minh Nguyen 0002, Nancy F. Chen
Phonology-Augmented Statistical Framework for Machine Transliteration Using Limited Linguistic Resources.

ACL2019 Zhengyuan Liu, Nancy F. Chen
Reading Turn by Turn: Hierarchical Attention Architecture for Spoken Dialogue Comprehension.

SpeechComm2018 Van Tung Pham, Haihua Xu, Xiong Xiao, Nancy F. Chen, Eng Siong Chng, Haizhou Li 0001, 
Re-ranking spoken term detection with acoustic exemplars of keywords.

TASLP2018 Van Hai Do, Nancy F. Chen, Boon Pang Lim, Mark A. Hasegawa-Johnson, 
Multitask Learning for Phone Recognition of Underresourced Languages Using Mismatched Transcription.

ICASSP2018 Wenda Chen, Mark Hasegawa-Johnson, Nancy F. Chen
Recognizing Zero-Resourced Languages Based on Mismatched Machine Transcriptions.

ICASSP2018 Wei Li 0119, Nancy F. Chen, Sabato Marco Siniscalchi, Chin-Hui Lee, 
Improving Mandarin Tone Mispronunciation Detection for Non-Native Learners with Soft-Target Tone Labels and BLSTM-Based Deep Models.

Interspeech2018 Wenda Chen, Mark Hasegawa-Johnson, Nancy F. Chen
Topic and Keyword Identification for Low-resourced Speech Using Cross-Language Transfer Learning.

EMNLP2018 Minh Nguyen 0002, Hoang Gia Ngo, Nancy F. Chen
Multimodal neural pronunciation modeling for spoken languages with logographic origin.

#89  | Frank K. Soong | Google Scholar   DBLP
VenuesInterspeech: 15ICASSP: 14TASLP: 3SpeechComm: 3
Years2022: 62021: 42020: 52019: 72018: 32017: 42016: 6
ISCA Sectionspeech synthesis: 9singing voice computing and processing in music: 1voice conversion and speech synthesis: 1applications in education and learning: 1language recognition: 1l1 and l2 acquisition: 1short utterances speaker recognition: 1
IEEE Keywordspeech synthesis: 8natural language processing: 7speaker recognition: 4speech recognition: 4text analysis: 3regression analysis: 3recurrent neural nets: 3autoregressive processes: 2neural tts: 2linguistics: 2pronunciation assessment: 2computer assisted language learning: 2computer aided instruction: 2ordinal regression: 2unit selection: 2lpcnet: 2filtering theory: 2vocoders: 2probability: 2deep neural networks: 2style transfer: 1variational inference: 1disjoint datasets: 1style and speaker attributes: 1computational linguistics: 1text to speech (tts): 1long form: 1cross sentence: 1mispronunciation detection: 1universal ordinal regression: 1phoneme recognition: 1acoustic phonetic linguistic embeddings: 1computer aided pronunciation training: 1mispronunciation detection and diagnosis: 1mos prediction: 1mean bias network: 1sensitivity analysis: 1speech intelligibility: 1video signal processing: 1correlation methods: 1speech quality assessment: 1medical image processing: 1goodness of pronunciation: 1hybrid text to speech: 1speech coding: 1sequence to sequence: 1spectral analysis: 1trajectory tiling: 1text to speech: 1lp mdn: 1neural vocoder: 1prosody: 1bert: 1kl divergence: 1absolute f0 difference: 1dynamic acoustic difference: 1keyword spotting: 1asr: 1esl: 1domain adversarial training: 1call: 1speech fluency assessment: 1anchored reference sample: 1pattern classification: 1mean opinion score (mos): 1computer assisted language learning (call): 1permutation invariant training: 1pitch tracking: 1source separation: 1speech separation: 1deep clustering: 1text dependent speaker verification: 1dynamic time warping: 1sequential speaker characteristics: 1support vector machines: 1speaker supervector: 1recurrent neural network: 1improved time frequency trajectory excitation vocoder: 1long short term memory: 1statistical parametric speech synthesis: 1speaker adaptation: 1hidden markov models: 1cross lingual: 1kullback leibler divergence: 1
Most Publications2007: 252008: 242006: 232012: 162021: 15

Affiliations
Microsoft Research Asia, Beijing, China
Chinese University of Hong Kong (CUHK), Department of Systems Engineering and Engineering Management, Hong Kong
Bell Labs Research, Murray Hill, NJ, USA
University of Stanford, Department of Electrical Engineering, CA, USA (PhD)

TASLP2022 Xiaochun An, Frank K. Soong, Lei Xie 0001, 
Disentangling Style and Speaker Attributes for TTS Style Transfer.

TASLP2022 Liumeng Xue, Frank K. Soong, Shaofei Zhang, Lei Xie 0001, 
ParaTTS: Learning Linguistic and Prosodic Cross-Sentence Information in Paragraph-Based TTS.

ICASSP2022 Shaoguang Mao, Frank K. Soong, Yan Xia 0005, Jonathan Tien, 
A Universal Ordinal Regression for Assessing Phoneme-Level Pronunciation.

ICASSP2022 Wenxuan Ye, Shaoguang Mao, Frank K. Soong, Wenshan Wu, Yan Xia 0005, Jonathan Tien, Zhiyong Wu 0001, 
An Approach to Mispronunciation Detection and Diagnosis with Acoustic, Phonetic and Linguistic (APL) Embeddings.

Interspeech2022 Mutian He 0001, Jingzhou Yang, Lei He 0005, Frank K. Soong
Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge.

Interspeech2022 Haohan Guo, Feng-Long Xie, Frank K. Soong, Xixin Wu, Helen Meng, 
A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS.

ICASSP2021 Yichong Leng, Xu Tan 0003, Sheng Zhao, Frank K. Soong, Xiang-Yang Li 0001, Tao Qin, 
MBNET: MOS Prediction for Synthesized Speech with Mean-Bias Network.

ICASSP2021 Bin Su, Shaoguang Mao, Frank K. Soong, Yan Xia 0005, Jonathan Tien, Zhiyong Wu 0001, 
Improving Pronunciation Assessment Via Ordinal Regression with Anchored Reference Samples.

ICASSP2021 Feng-Long Xie, Xinhui Li, Wen-Chao Su, Li Lu, Frank K. Soong
A New High Quality Trajectory Tiling Based Hybrid TTS In Real Time.

Interspeech2021 Xiaochun An, Frank K. Soong, Lei Xie 0001, 
Improving Performance of Seen and Unseen Speech Style Transfer in End-to-End Neural TTS.

ICASSP2020 Min-Jae Hwang, Eunwoo Song, Ryuichi Yamamoto, Frank K. Soong, Hong-Goo Kang, 
Improving LPCNET-Based Text-to-Speech with Linear Prediction-Structured Mixture Density Network.

ICASSP2020 Yujia Xiao, Lei He 0005, Huaiping Ming, Frank K. Soong
Improving Prosody with Linguistic and Bert Derived Features in Multi-Speaker Based Mandarin Chinese Neural TTS.

ICASSP2020 Feng-Long Xie, Xinhui Li, Bo Liu, Yibin Zheng, Li Meng, Li Lu, Frank K. Soong
An Improved Frame-Unit-Selection Based Voice Conversion System Without Parallel Training Data.

Interspeech2020 Yang Cui, Xi Wang 0016, Lei He 0005, Frank K. Soong
An Efficient Subband Linear Prediction for LPCNet-Based Neural Synthesis.

Interspeech2020 Yuanbo Hou, Frank K. Soong, Jian Luan 0001, Shengchen Li, 
Transfer Learning for Improving Singing-Voice Detection in Polyphonic Instrumental Music.

SpeechComm2019 Feng-Long Xie, Frank K. Soong, Haifeng Li 0001, 
Voice conversion with SI-DNN and KL divergence based mapping without parallel training data.

ICASSP2019 Jingyong Hou, Pengcheng Guo, Sining Sun, Frank K. Soong, Wenping Hu, Lei Xie 0001, 
Domain Adversarial Training for Improving Keyword Spotting Performance of ESL Speech.

ICASSP2019 Shaoguang Mao, Zhiyong Wu 0001, Jingshuai Jiang, Peiyun Liu, Frank K. Soong
NN-based Ordinal Regression for Assessing Fluency of ESL Speech.

ICASSP2019 Ke Wang, Frank K. Soong, Lei Xie 0001, 
A Pitch-aware Approach to Single-channel Speech Separation.

Interspeech2019 Haohan Guo, Frank K. Soong, Lei He 0005, Lei Xie 0001, 
A New GAN-Based End-to-End TTS Training Algorithm.

#90  | George Saon | Google Scholar   DBLP
VenuesInterspeech: 23ICASSP: 12
Years2022: 102021: 72020: 22019: 72018: 12017: 52016: 3
ISCA Sectionasr: 2asr neural network training: 2conversational telephone speech recognition: 2novel models and training methods for asr: 1neural transducers, streaming asr and novel asr models: 1multi-, cross-lingual and other topics in asr: 1other topics in speech recognition: 1streaming for asr/rnn transducers: 1neural network training methods for asr: 1spoken language understanding: 1language and lexical modeling for asr: 1novel neural network architectures for asr: 1streaming asr: 1asr neural network architectures and training: 1resources – annotation – evaluation: 1sequence-to-sequence speech recognition: 1neural network acoustic models for asr: 1neural networks for language modeling: 1neural networks in speech recognition: 1acoustic model adaptation: 1
IEEE Keywordspeech recognition: 10recurrent neural nets: 8automatic speech recognition: 6natural language processing: 3spoken language understanding: 3decoding: 2rnn transducers: 2end to end models: 2text analysis: 2end to end asr: 2speaker recognition: 2lstm: 2neural net architecture: 1spiking neural networks: 1brain: 1synapse types: 1spiking neural unit: 1rnn t: 1neurophysiology: 1speech coding: 1data handling: 1encoder decoder: 1atis: 1transducers: 1attention: 1natural languages: 1end to end mod els: 1language model customization: 1adaptation: 1data analysis: 1recurrent neural network transducer: 1multiplicative integration: 1sensor fusion: 1noise injection: 1broadcast news: 1deep neural networks.: 1parallel computing: 1graphics processing units: 1switchboard.: 1parallel processing: 1hidden markov models: 1direct acoustics to word models: 1multilingual: 1acoustic model: 1vgg: 1keyword search: 1diarization: 1event detection: 1long short term memory: 1music detection: 1
Most Publications2022: 182021: 162017: 152019: 122020: 8

Affiliations
URLs

ICASSP2022 Thomas Bohnstingl, Ayush Garg 0006, Stanislaw Wozniak, George Saon, Evangelos Eleftheriou, Angeliki Pantazi, 
Speech Recognition Using Biologically-Inspired Neural Networks.

ICASSP2022 Hong-Kwang Jeff Kuo, Zoltán Tüske, Samuel Thomas 0001, Brian Kingsbury, George Saon
Improving End-to-end Models for Set Prediction in Spoken Language Understanding.

ICASSP2022 Samuel Thomas 0001, Hong-Kwang Jeff Kuo, Brian Kingsbury, George Saon
Towards Reducing the Need for Speech Training Data to Build Spoken Language Understanding Systems.

ICASSP2022 Samuel Thomas 0001, Brian Kingsbury, George Saon, Hong-Kwang Jeff Kuo, 
Integrating Text Inputs for Training and Adapting RNN Transducer ASR Models.

Interspeech2022 Xiaodong Cui, George Saon, Tohru Nagano, Masayuki Suzuki, Takashi Fukuda, Brian Kingsbury, Gakuto Kurata, 
Improving Generalization of Deep Neural Network Acoustic Models with Length Perturbation and N-best Based Label Smoothing.

Interspeech2022 Andrea Fasoli, Chia-Yu Chen, Mauricio J. Serrano, Swagath Venkataramani, George Saon, Xiaodong Cui, Brian Kingsbury, Kailash Gopalakrishnan, 
Accelerating Inference and Language Model Fusion of Recurrent Neural Network Transducers via End-to-End 4-bit Quantization.

Interspeech2022 Takashi Fukuda, Samuel Thomas 0001, Masayuki Suzuki, Gakuto Kurata, George Saon, Brian Kingsbury, 
Global RNN Transducer Models For Multi-dialect Speech Recognition.

Interspeech2022 Zvi Kons, Hagai Aronowitz, Edmilson da Silva Morais, Matheus Damasceno, Hong-Kwang Kuo, Samuel Thomas 0001, George Saon
Extending RNN-T-based speech recognition systems with emotion and language classification.

Interspeech2022 Jiatong Shi, George Saon, David Haws, Shinji Watanabe 0001, Brian Kingsbury, 
VQ-T: RNN Transducers using Vector-Quantized Prediction Network States.

Interspeech2022 Takuma Udagawa, Masayuki Suzuki, Gakuto Kurata, Nobuyasu Itoh, George Saon
Effect and Analysis of Large-scale Language Model Rescoring on Competitive ASR Systems.

ICASSP2021 Samuel Thomas 0001, Hong-Kwang Jeff Kuo, George Saon, Zoltán Tüske, Brian Kingsbury, Gakuto Kurata, Zvi Kons, Ron Hoory, 
RNN Transducer Models for Spoken Language Understanding.

ICASSP2021 George Saon, Zoltán Tüske, Daniel Bolaños, Brian Kingsbury, 
Advancing RNN Transducer Technology for Speech Recognition.

Interspeech2021 Xiaodong Cui, Brian Kingsbury, George Saon, David Haws, Zoltán Tüske, 
Reducing Exposure Bias in Training Recurrent Neural Network Transducers.

Interspeech2021 Andrea Fasoli, Chia-Yu Chen, Mauricio J. Serrano, Xiao Sun, Naigang Wang, Swagath Venkataramani, George Saon, Xiaodong Cui, Brian Kingsbury, Wei Zhang 0022, Zoltán Tüske, Kailash Gopalakrishnan, 
4-Bit Quantization of LSTM-Based Speech Recognition Models.

Interspeech2021 Jatin Ganhotra, Samuel Thomas 0001, Hong-Kwang Jeff Kuo, Sachindra Joshi, George Saon, Zoltán Tüske, Brian Kingsbury, 
Integrating Dialog History into End-to-End Spoken Language Understanding Systems.

Interspeech2021 Gakuto Kurata, George Saon, Brian Kingsbury, David Haws, Zoltán Tüske, 
Improving Customization of Neural Transducers by Mitigating Acoustic Mismatch of Synthesized Audio.

Interspeech2021 Zoltán Tüske, George Saon, Brian Kingsbury, 
On the Limit of English Conversational Speech Recognition.

Interspeech2020 Gakuto Kurata, George Saon
Knowledge Distillation from Offline to Streaming RNN Transducer for End-to-End Speech Recognition.

Interspeech2020 Zoltán Tüske, George Saon, Kartik Audhkhasi, Brian Kingsbury, 
Single Headed Attention Based Sequence-to-Sequence Model for State-of-the-Art Results on Switchboard.

ICASSP2019 George Saon, Zoltán Tüske, Kartik Audhkhasi, Brian Kingsbury, 
Sequence Noise Injected Training for End-to-end Speech Recognition.

#91  | Hemant A. Patil | Google Scholar   DBLP
VenuesInterspeech: 23ICASSP: 9SpeechComm: 2TASLP: 1
Years2022: 12021: 32020: 12019: 52018: 102017: 62016: 9
ISCA Sectionspeech analysis and representation: 4spoofing detection: 3special session: 2music and audio processing: 2speech coding and privacy: 1speaker recognition and anti-spoofing: 1speech and audio characterization and segmentation: 1model training for asr: 1low resource speech recognition challenge for indian languages: 1voice conversion and speech synthesis: 1novel paradigms for direct synthesis based on speech-related biosignals: 1speaker database and anti-spoofing: 1speech analysis: 1acoustic modeling with neural networks: 1speech enhancement and noise reduction: 1speaker recognition: 1
IEEE Keywordspeech recognition: 4gaussian processes: 3cepstral analysis: 2mixture models: 2speaker recognition: 2voice conversion: 2hidden markov models: 2filterbank: 2boltzmann machines: 2channel bank filters: 2speech synthesis: 2signal classification: 1infant cry classification: 1gmm: 1short time fourier transform: 1support vector machines: 1fourier transforms: 1constant q transform: 1svm: 1gan: 1whisper: 1decoding: 1mspec net: 1autoencoder: 1non audible murmur: 1spoof: 1replay: 1automatic speaker verification (asv): 1voice activity detection: 1replay configurations (rc): 1reverberation: 1iterative methods: 1inca: 1vc: 1pattern classification: 1metric learning: 1lmnn: 1mean square error methods: 1articulatory features: 1acoustic to articulatory inversion: 1frequency warping: 1amplitude scaling: 1speech: 1gaussian mixture model: 1subband filters: 1frequency domain analysis: 1convrbm: 1unsupervised learning: 1auditory processing: 1anti spoofing: 1f0: 1mfcc: 1soe: 1vocoders: 1cfccif: 1accent: 1speech intelligibility: 1phrase: 1synthetic speech: 1fujisaki model: 1convolutional rbm: 1rectified linear units: 1pooling: 1
Most Publications2017: 282018: 272022: 242014: 202020: 17


ICASSP2022 Hemant A. Patil, Ankur T. Patil, Aastha Kachhi, 
Constant Q Cepstral coefficients for classification of normal vs. Pathological infant cry.

SpeechComm2021 Madhu R. Kamble, Hemlata Tak, Hemant A. Patil
Amplitude and Frequency Modulation-based features for detection of replay Spoof Speech.

SpeechComm2021 Meet H. Soni, Hemant A. Patil
Non-intrusive quality assessment of noise-suppressed speech using unsupervised deep features.

Interspeech2021 Gauri P. Prajapati, Dipesh K. Singh, Preet P. Amin, Hemant A. Patil
Voice Privacy Through x-Vector and CycleGAN-Based Anonymization.

ICASSP2020 Harshit Malaviya, Jui Shah, Maitreya Patel, Jalansh Munshi, Hemant A. Patil
Mspec-Net : Multi-Domain Speech Conversion Network.

ICASSP2019 Madhu R. Kamble, Hemant A. Patil
Analysis of Reverberation via Teager Energy Features for Replay Spoof Speech Detection.

ICASSP2019 Nirmesh J. Shah, Hemant A. Patil
Novel Metric Learning for Non-parallel Voice Conversion.

Interspeech2019 Ankur T. Patil, Rajul Acharya, Pulikonda Krishna Aditya Sai, Hemant A. Patil
Energy Separation-Based Instantaneous Frequency Estimation for Cochlear Cepstral Feature for Replay Spoof Detection.

Interspeech2019 Nirmesh J. Shah, Hemant A. Patil
Phone Aware Nearest Neighbor Technique Using Spectral Transition Measure for Non-Parallel Voice Conversion.

Interspeech2019 Nirmesh J. Shah, Hardik B. Sailor, Hemant A. Patil
Whether to Pretrain DNN or not?: An Empirical Analysis for Voice Conversion.

Interspeech2018 Madhu R. Kamble, Hemant A. Patil
Novel Variable Length Energy Separation Algorithm Using Instantaneous Amplitude Features for Replay Detection.

Interspeech2018 Madhu R. Kamble, Hemlata Tak, Hemant A. Patil
Effectiveness of Speech Demodulation-Based Features for Replay Detection.

Interspeech2018 Hardik B. Sailor, Maddala Venkata Siva Krishna, Diksha Chhabra, Ankur T. Patil, Madhu R. Kamble, Hemant A. Patil
DA-IICT/IIITV System for Low Resource Speech Recognition Challenge 2018.

Interspeech2018 Hardik B. Sailor, Madhu R. Kamble, Hemant A. Patil
Auditory Filterbank Learning for Temporal Modulation Features in Replay Spoof Speech Detection.

Interspeech2018 Hardik B. Sailor, Hemant A. Patil
Auditory Filterbank Learning Using ConvRBM for Infant Cry Classification.

Interspeech2018 Nirmesh J. Shah, Maulik C. Madhavi, Hemant A. Patil
Unsupervised Vocal Tract Length Warped Posterior Features for Non-Parallel Voice Conversion.

Interspeech2018 Nirmesh J. Shah, Hemant A. Patil
Effectiveness of Dynamic Features in INCA and Temporal Context-INCA.

Interspeech2018 Neil Shah, Nirmesh J. Shah, Hemant A. Patil
Effectiveness of Generative Adversarial Network for Non-Audible Murmur-to-Whisper Speech Conversion.

Interspeech2018 Hemlata Tak, Hemant A. Patil
Novel Linear Frequency Residual Cepstral Features for Replay Attack Detection.

Interspeech2018 Prasad Tapkir, Hemant A. Patil
Novel Empirical Mode Decomposition Cepstral Features for Replay Spoof Detection.

#92  | Jiangyan Yi | Google Scholar   DBLP
VenuesInterspeech: 20ICASSP: 8TASLP: 6SpeechComm: 1
Years2023: 12022: 42021: 102020: 112019: 72018: 12017: 1
ISCA Sectionspeech synthesis: 3topics in asr: 2voice conversion and adaptation: 2asr: 1privacy-preserving machine learning for audio & speech processing: 1search/decoding techniques and confidence measures for asr: 1speech coding and privacy: 1computational resource constrained speech recognition: 1multi-channel audio and emotion recognition: 1speech enhancement: 1asr neural network architectures: 1sequence-to-sequence speech recognition: 1spoken term detection, confidence measure, and end-to-end speech recognition: 1speech and audio source separation and scene analysis: 1nn architectures for asr: 1speech recognition: 1
IEEE Keywordspeech recognition: 9speech synthesis: 6natural language processing: 6end to end: 5transfer learning: 4text analysis: 3speaker recognition: 3low resource: 3speech coding: 2text based speech editing: 2text editing: 2end to end model: 2decoding: 2attention: 2waveform generators: 1stochastic processes: 1vocoder: 1filtering theory: 1deterministic plus stochastic: 1multiband excitation: 1noise control: 1vocoders: 1coarse to fine decoding: 1mask prediction: 1text to speech: 1one shot learning: 1mask and prediction: 1cross modal: 1bert: 1non autoregressive: 1fast: 1autoregressive processes: 1language modeling: 1teacher student learning: 1robust end to end speech recognition: 1speech distortion: 1speech enhancement: 1speech transformer: 1gated recurrent fusion: 1decoupled transformer: 1code switching: 1automatic speech recognition: 1bi level decoupling: 1prosody modeling: 1personalized speech synthesis: 1speaking style modeling: 1few shot speaker adaptation: 1prosody and voice factorization: 1the m2voc challenge: 1optimisation: 1prosody transfer: 1speaker adaptation: 1optimization strategy: 1audio signal processing: 1adversarial training: 1cross lingual: 1self attention: 1speech embedding: 1word embedding: 1punctuation prediction: 1language invariant: 1adversarial: 1adversarial multilingual training: 1bottleneck features: 1deep neural networks: 1
Most Publications2022: 262020: 242021: 212019: 192023: 9

Affiliations
URLs

SpeechComm2023 Jiangyan Yi, Jianhua Tao, Ye Bai, Zhengkun Tian, Cunhang Fan, 
Transfer knowledge for punctuation prediction via adversarial training.

TASLP2022 Tao Wang 0074, Ruibo Fu, Jiangyan Yi, Jianhua Tao, Zhengqi Wen, 
NeuralDPS: Neural Deterministic Plus Stochastic Model With Multiband Excitation for Noise-Controllable Waveform Generation.

TASLP2022 Tao Wang 0074, Jiangyan Yi, Ruibo Fu, Jianhua Tao, Zhengqi Wen, 
CampNet: Context-Aware Mask Prediction for End-to-End Text-Based Speech Editing.

ICASSP2022 Tao Wang 0074, Jiangyan Yi, Liqun Deng, Ruibo Fu, Jianhua Tao, Zhengqi Wen, 
Context-Aware Mask Prediction Network for End-to-End Text-Based Speech Editing.

Interspeech2022 Shuai Zhang 0014, Jiangyan Yi, Zhengkun Tian, Jianhua Tao, Yu Ting Yeung, Liqun Deng, 
reducing multilingual context confusion for end-to-end code-switching automatic speech recognition.

TASLP2021 Ye Bai, Jiangyan Yi, Jianhua Tao, Zhengkun Tian, Zhengqi Wen, Shuai Zhang 0014, 
Fast End-to-End Speech Recognition Via Non-Autoregressive Models and Cross-Modal Knowledge Transferring From BERT.

TASLP2021 Ye Bai, Jiangyan Yi, Jianhua Tao, Zhengqi Wen, Zhengkun Tian, Shuai Zhang 0014, 
Integrating Knowledge Into End-to-End Speech Recognition From External Text-Only Data.

TASLP2021 Cunhang Fan, Jiangyan Yi, Jianhua Tao, Zhengkun Tian, Bin Liu 0041, Zhengqi Wen, 
Gated Recurrent Fusion With Joint Training Framework for Robust End-to-End Speech Recognition.

ICASSP2021 Shuai Zhang 0014, Jiangyan Yi, Zhengkun Tian, Ye Bai, Jianhua Tao, Zhengqi Wen, 
Decoupling Pronunciation and Language for End-to-End Code-Switching Automatic Speech Recognition.

ICASSP2021 Ruibo Fu, Jianhua Tao, Zhengqi Wen, Jiangyan Yi, Tao Wang 0074, Chunyu Qiang, 
Bi-Level Style and Prosody Decoupling Modeling for Personalized End-to-End Speech Synthesis.

ICASSP2021 Tao Wang 0074, Ruibo Fu, Jiangyan Yi, Jianhua Tao, Zhengqi Wen, Chunyu Qiang, Shiming Wang, 
Prosody and Voice Factorization for Few-Shot Speaker Adaptation in the Challenge M2voc 2021.

Interspeech2021 Shuai Zhang 0014, Jiangyan Yi, Zhengkun Tian, Ye Bai, Jianhua Tao, Xuefei Liu, Zhengqi Wen, 
End-to-End Spelling Correction Conditioned on Acoustic Feature for Code-Switching Speech Recognition.

Interspeech2021 Haoxin Ma, Jiangyan Yi, Jianhua Tao, Ye Bai, Zhengkun Tian, Chenglong Wang, 
Continual Learning for Fake Audio Detection.

Interspeech2021 Zhengkun Tian, Jiangyan Yi, Ye Bai, Jianhua Tao, Shuai Zhang 0014, Zhengqi Wen, 
FSR: Accelerating the Inference Process of Transducer-Based Models by Applying Fast-Skip Regularization.

Interspeech2021 Jiangyan Yi, Ye Bai, Jianhua Tao, Haoxin Ma, Zhengkun Tian, Chenglong Wang, Tao Wang 0074, Ruibo Fu, 
Half-Truth: A Partially Fake Audio Detection Dataset.

ICASSP2020 Ruibo Fu, Jianhua Tao, Zhengqi Wen, Jiangyan Yi, Tao Wang 0074, 
Focusing on Attention: Prosody Transfer and Adaptative Optimization Strategy for Multi-Speaker End-to-End Speech Synthesis.

Interspeech2020 Ye Bai, Jiangyan Yi, Jianhua Tao, Zhengkun Tian, Zhengqi Wen, Shuai Zhang 0014, 
Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition.

Interspeech2020 Cunhang Fan, Jianhua Tao, Bin Liu 0041, Jiangyan Yi, Zhengqi Wen, 
Gated Recurrent Fusion of Spatial and Spectral Features for Multi-Channel Speech Separation with Deep Embedding Representations.

Interspeech2020 Cunhang Fan, Jianhua Tao, Bin Liu 0041, Jiangyan Yi, Zhengqi Wen, 
Joint Training for Simultaneous Speech Denoising and Dereverberation with Deep Embedding Representations.

Interspeech2020 Ruibo Fu, Jianhua Tao, Zhengqi Wen, Jiangyan Yi, Chunyu Qiang, Tao Wang 0074, 
Dynamic Soft Windowing and Language Dependent Style Token for Code-Switching End-to-End Speech Synthesis.

#93  | Jesper Jensen 0001 | Google Scholar   DBLP
VenuesTASLP: 15ICASSP: 9Interspeech: 7SpeechComm: 3
Years2023: 12022: 22021: 52020: 72019: 52018: 42017: 52016: 5
ISCA Sectionspeech intelligibility: 2speech enhancement and intelligibility: 1speech synthesis: 1noise reduction and intelligibility: 1speech recognition and beyond: 1models of speech perception: 1
IEEE Keywordspeech enhancement: 16speech intelligibility: 11array signal processing: 5maximum likelihood estimation: 5speech recognition: 4audio visual systems: 3audio signal processing: 3deep neural networks: 3hearing aids: 3wiener filters: 3reverberation: 3microphone arrays: 3maximum likelihood: 2hearing: 2multi microphone: 2computational complexity: 2speech separation: 2sensor fusion: 2mean square error methods: 2filtering theory: 2spectral analysis: 2power spectral density estimation: 2intelligibility: 2least mean squares methods: 2enhanced speech: 2speech in noise: 2time frequency analysis: 2binaural speech intelligibility prediction: 2psd estimation: 2turn taking: 1beamforming: 1speech behavior: 1signal denoising: 1asii: 1approximation theory: 1beamformer: 1speech intelligibility enhancement: 1spectro temporal modulation: 1regression analysis: 1modulation: 1speech quality model: 1keyword spotting: 1keyword embedding: 1deep metric learning: 1text analysis: 1multi condition training: 1noise robustness: 1loss function: 1audio visual processing: 1speech synthesis: 1sound source separation: 1source separation: 1multi task learning: 1face landmarks: 1audio visual: 1deep learning (artificial intelligence): 1speech inpainting: 1supervised learning: 1fully convolutional neural networks: 1time domain: 1objective intelligibility: 1gradient methods: 1multichannel speech enhancement: 1probability: 1kalman filter: 1recursive expectation maximization: 1speech presence probability: 1expectation maximisation algorithm: 1own voice retrieval: 1multi microphone speech enhancement: 1intrusive: 1speech: 1monaural: 1prediction: 1mutual information: 1gaussian processes: 1pattern classification: 1decoding: 1vocabulary: 1human auditory system: 1maximum likelihood classifier: 1gaussian mixture model: 1minimum mean square error estimator: 1correlation theory: 1objective functions: 1training targets: 1audio visual speech enhancement: 1convolution: 1nonintrusive speech intelligibility prediction: 1convolutional neural networks: 1dereverberation: 1diffuse sound: 1array processing: 1transfer functions: 1relative transfer function: 1hearing aid: 1direction of arrival estimation: 1sound source localization: 1ideal ratio mask: 1generalizability: 1non intrusive speech intelligibility prediction: 1cocktail party problem: 1permutation invariant training: 1cnn: 1dnn: 1binaural advantage: 1speech transmission: 1objective distortion measures: 1speech intelligibility prediction: 1modulated noise sources: 1noise reduction: 1error statistics: 1microphone array: 1cramér–rao lower bound: 1microphones: 1interference (signal): 1speaker recognition: 1maximum likelihood estimator: 1speech dereverberation: 1isotropic sound field: 1
Most Publications2019: 162020: 152018: 152016: 142017: 13

Affiliations
Aalborg University, Department of Electronic Systems, Denmark
Oticon A/S, Smørum, Denmark
Delft University of Technology, The Netherlands

SpeechComm2023 Iván López-Espejo, Amin Edraki, Wai-Yip Chan, Zheng-Hua Tan, Jesper Jensen 0001
On the deficiency of intelligibility metrics as proxies for subjective intelligibility.

TASLP2022 Poul Hoang, Jan Mark de Haan, Zheng-Hua Tan, Jesper Jensen 0001
Multichannel Speech Enhancement With Own Voice-Based Interfering Speech Suppression for Hearing Assistive Devices.

ICASSP2022 Andreas Jonas Fuglsig, Jan Østergaard, Jesper Jensen 0001, Lars Søndergaard Bertelsen, Peter Mariager, Zheng-Hua Tan, 
Joint Far- and Near-End Speech Intelligibility Enhancement Based on the Approximated Speech Intelligibility Index.

TASLP2021 Amin Edraki, Wai-Yip Chan, Jesper Jensen 0001, Daniel Fogerty, 
Speech Intelligibility Prediction Using Spectro-Temporal Modulation Analysis.

TASLP2021 Iván López-Espejo, Zheng-Hua Tan, Jesper Jensen 0001
A Novel Loss Function and Training Strategy for Noise-Robust Keyword Spotting.

TASLP2021 Daniel Michelsanti, Zheng-Hua Tan, Shi-Xiong Zhang, Yong Xu 0004, Meng Yu 0003, Dong Yu 0001, Jesper Jensen 0001
An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation.

ICASSP2021 Giovanni Morrone, Daniel Michelsanti, Zheng-Hua Tan, Jesper Jensen 0001
Audio-Visual Speech Inpainting with Deep Learning.

Interspeech2021 Amin Edraki, Wai-Yip Chan, Jesper Jensen 0001, Daniel Fogerty, 
A Spectro-Temporal Glimpsing Index (STGI) for Speech Intelligibility Prediction.

SpeechComm2020 Daniel Michelsanti, Zheng-Hua Tan, Sigurdur Sigurdsson, Jesper Jensen 0001
Deep-learning-based audio-visual speech enhancement in presence of Lombard effect.

TASLP2020 Morten Kolbæk, Zheng-Hua Tan, Søren Holdt Jensen, Jesper Jensen 0001
On Loss Functions for Supervised Monaural Time-Domain Speech Enhancement.

TASLP2020 Juan M. Martín-Doñas, Jesper Jensen 0001, Zheng-Hua Tan, Angel M. Gomez, Antonio M. Peinado, 
Online Multichannel Speech Enhancement Based on Recursive EM and DNN-Based Speech Presence Estimation.

ICASSP2020 Poul Hoang, Zheng-Hua Tan, Thomas Lunner, Jan Mark de Haan, Jesper Jensen 0001
Maximum Likelihood Estimation of the Interference-Plus-Noise Cross Power Spectral Density Matrix for Own Voice Retrieval.

ICASSP2020 Mathias Bach Pedersen, Asger Heidemann Andersen, Søren Holdt Jensen, Jesper Jensen 0001
A Neural Network for Monaural Intrusive Speech Intelligibility Prediction.

Interspeech2020 Daniel Michelsanti, Olga Slizovskaia, Gloria Haro, Emilia Gómez, Zheng-Hua Tan, Jesper Jensen 0001
Vocoder-Based Speech Synthesis from Silent Videos.

Interspeech2020 Mathias Bach Pedersen, Morten Kolbæk, Asger Heidemann Andersen, Søren Holdt Jensen, Jesper Jensen 0001
End-to-End Speech Intelligibility Prediction Using Time-Domain Fully Convolutional Neural Networks.

TASLP2019 Mohsen Zareian Jahromi, Adel Zahedi, Jesper Jensen 0001, Jan Østergaard, 
Information Loss in the Human Auditory System.

TASLP2019 Morten Kolbaek, Zheng-Hua Tan, Jesper Jensen 0001
On the Relationship Between Short-Time Objective Intelligibility and Short-Time Spectral-Amplitude Mean-Square Error for Speech Enhancement.

ICASSP2019 Daniel Michelsanti, Zheng-Hua Tan, Sigurdur Sigurdsson, Jesper Jensen 0001
On Training Targets and Objective Functions for Deep-learning-based Audio-visual Speech Enhancement.

Interspeech2019 Amin Edraki, Wai-Yip Chan, Jesper Jensen 0001, Daniel Fogerty, 
Improvement and Assessment of Spectro-Temporal Modulation Analysis for Speech Intelligibility Estimation.

Interspeech2019 Iván López-Espejo, Zheng-Hua Tan, Jesper Jensen 0001
Keyword Spotting for Hearing Assistive Devices Robust to External Speakers.

#94  | Xugang Lu | Google Scholar   DBLP
VenuesInterspeech: 22ICASSP: 6TASLP: 4NeurIPS: 1SpeechComm: 1
Years2022: 32021: 52020: 52019: 72018: 42017: 22016: 8
ISCA Sectionspeech enhancement and intelligibility: 3speech enhancement: 2speech quality assessment: 1speaker and language recognition: 1single-channel speech enhancement: 1large-scale evaluation of short-duration speaker verification: 1cross-lingual and multilingual asr: 1spoken term detection, confidence measure, and end-to-end speech recognition: 1nn architectures for asr: 1speech and audio classification: 1acoustic modelling: 1audio events and acoustic scenes: 1language identification: 1speaker and language recognition applications: 1speech enhancement and noise reduction: 1language modeling for conversational speech and confidence measures: 1decoding, system combination: 1language recognition: 1co-inference of production and acoustics: 1
IEEE Keywordspeech recognition: 6speaker recognition: 3spoken language identification: 3speech enhancement: 3pattern classification: 2knowledge distillation: 2generative model: 1bayes methods: 1speaker verification: 1affine transforms: 1discriminative model: 1joint bayesian model: 1optimal transport: 1unsupervised domain adaptation: 1statistical distributions: 1short utterances: 1internal representation learning: 1deep neural networks: 1ensemble learning: 1decision trees: 1generalizability: 1dynamically sized decision tree: 1signal denoising: 1regression analysis: 1deep denoising autoencoder: 1decoding: 1signal classification: 1unsupervised learning: 1interactive teacher student learning: 1computer aided instruction: 1teacher model optimization: 1short utterance feature representation: 1natural languages: 1optimisation: 1end to end speech enhancement: 1mean square error methods: 1speech intelligibility: 1fully convolutional neural network: 1automatic speech recognition: 1raw waveform: 1acoustic model: 1semi supervised training: 1dnn: 1linear transformation network: 1deep neural network: 1singular value decomposition: 1speaker adaptive training: 1lfda: 1discriminant analysis: 1language identification: 1i vector: 1
Most Publications2016: 172020: 152022: 142021: 142017: 13

Affiliations
URLs

Interspeech2022 Rong Chao, Cheng Yu, Szu-Wei Fu, Xugang Lu, Yu Tsao 0001, 
Perceptual Contrast Stretching on Target Feature for Speech Enhancement.

Interspeech2022 Kai Li 0018, Sheng Li 0010, Xugang Lu, Masato Akagi, Meng Liu, Lin Zhang, Chang Zeng, Longbiao Wang, Jianwu Dang 0001, Masashi Unoki, 
Data Augmentation Using McAdams-Coefficient-Based Speaker Anonymization for Fake Audio Detection.

Interspeech2022 Peng Shen, Xugang Lu, Hisashi Kawai, 
Transducer-based language embedding for spoken language identification.

TASLP2021 Xugang Lu, Peng Shen, Yu Tsao 0001, Hisashi Kawai, 
Coupling a Generative Model With a Discriminative Learning Framework for Speaker Verification.

ICASSP2021 Xugang Lu, Peng Shen, Yu Tsao 0001, Hisashi Kawai, 
Unsupervised Neural Adaptation Model Based on Optimal Transport for Spoken Language Identification.

Interspeech2021 Szu-Wei Fu, Cheng Yu, Tsun-An Hsieh, Peter Plantinga, Mirco Ravanelli, Xugang Lu, Yu Tsao 0001, 
MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement.

Interspeech2021 Tsun-An Hsieh, Cheng Yu, Szu-Wei Fu, Xugang Lu, Yu Tsao 0001, 
Improving Perceptual Quality by Phone-Fortified Perceptual Loss Using Wasserstein Distance for Speech Enhancement.

NeurIPS2021 Hsin-Yi Lin, Huan-Hsin Tseng, Xugang Lu, Yu Tsao 0001, 
Unsupervised Noise Adaptive Speech Enhancement by Discriminator-Constrained Optimal Transport.

TASLP2020 Peng Shen, Xugang Lu, Sheng Li 0010, Hisashi Kawai, 
Knowledge Distillation-Based Representation Learning for Short-Utterance Spoken Language Identification.

TASLP2020 Cheng Yu, Ryandhimas E. Zezario, Syu-Siang Wang, Jonathan Sherman, Yi-Yen Hsieh, Xugang Lu, Hsin-Min Wang, Yu Tsao 0001, 
Speech Enhancement Based on Denoising Autoencoder With Multi-Branched Encoders.

ICASSP2020 Ryandhimas E. Zezario, Tassadaq Hussain, Xugang Lu, Hsin-Min Wang, Yu Tsao 0001, 
Self-Supervised Denoising Autoencoder with Linear Regression Decoder for Speech Enhancement.

Interspeech2020 Yen-Ju Lu, Chien-Feng Liao, Xugang Lu, Jeih-weih Hung, Yu Tsao 0001, 
Incorporating Broad Phonetic Information for Speech Enhancement.

Interspeech2020 Peng Shen, Xugang Lu, Hisashi Kawai, 
Investigation of NICT Submission for Short-Duration Speaker Verification Challenge 2020.

ICASSP2019 Peng Shen, Xugang Lu, Sheng Li 0010, Hisashi Kawai, 
Interactive Learning of Teacher-student Model for Short Utterance Spoken Language Identification.

Interspeech2019 Sheng Li 0010, Chenchen Ding, Xugang Lu, Peng Shen, Tatsuya Kawahara, Hisashi Kawai, 
End-to-End Articulatory Attribute Modeling for Low-Resource Multilingual Speech Recognition.

Interspeech2019 Sheng Li 0010, Xugang Lu, Chenchen Ding, Peng Shen, Tatsuya Kawahara, Hisashi Kawai, 
Investigating Radical-Based End-to-End Speech Recognition Systems for Chinese Dialects and Japanese.

Interspeech2019 Sheng Li 0010, Raj Dabre, Xugang Lu, Peng Shen, Tatsuya Kawahara, Hisashi Kawai, 
Improving Transformer-Based Speech Recognition Systems with Compressed Structure and Speech Attributes Augmentation.

Interspeech2019 Chien-Feng Liao, Yu Tsao 0001, Xugang Lu, Hisashi Kawai, 
Incorporating Symbolic Sequential Modeling for Speech Enhancement.

Interspeech2019 Xugang Lu, Peng Shen, Sheng Li 0010, Yu Tsao 0001, Hisashi Kawai, 
Class-Wise Centroid Distance Metric Learning for Acoustic Event Detection.

Interspeech2019 Ryandhimas E. Zezario, Szu-Wei Fu, Xugang Lu, Hsin-Min Wang, Yu Tsao 0001, 
Specialized Speech Enhancement Model Selection Based on Learned Non-Intrusive Quality Assessment Metric.

#95  | Julien Epps | Google Scholar   DBLP
VenuesInterspeech: 23ICASSP: 7SpeechComm: 4
Years2021: 42020: 72019: 62018: 62017: 62016: 5
ISCA Sectionspeaker recognition and anti-spoofing: 2spoofing detection: 2special session: 2tools, corpora and resources: 1speech in health: 1computational paralinguistics: 1topics in asr: 1anti-spoofing and liveness detection: 1automatic speech recognition for non-native children’s speech: 1representation learning of emotion and paralinguistics: 1emotion recognition and analysis: 1speech pathology, depression, and medical applications: 1representation learning for emotion: 1speaker state and trait: 1speech analysis and representation: 1emotion modeling: 1speaker and language recognition applications: 1speaker states and traits: 1speech analysis: 1automatic assessment of emotions: 1
IEEE Keywordsmartphone speech: 2depression classification: 2speech articulation: 2speaker recognition: 2channel bank filters: 2speech recognition: 2emotion recognition: 2mean square error methods: 1signal classification: 1machine learning: 1smartphone: 1speech landmarks: 1diadochokinetic task: 1psychology: 1mental health screening: 1naturalistic environments: 1convolutional neural nets: 1eigenvalues and eigenfunctions: 1vocal tract coordination: 1medical signal detection: 1cnn: 1digital filters: 1transmission line cochlear model: 1replay attack: 1speaker verification: 1frequency modulation: 1amplitude modulation: 1spoofing: 1naturalistic environments.: 1natural language processing: 1landmark bigrams: 1smart phones: 1band pass filters: 1anti spoofing: 1asvspoof 2017: 1automatic speaker verification: 1spatial differentiation: 1iir filters: 1security of data: 1signal detection: 1relevance vector machine: 1regression analysis: 1staircase regression: 1emotion prediction: 1phone log likelihood ratio: 1gaussian processes: 1object detection: 1gaussian mixture model: 1mixture models: 1exchangeability: 1martingale: 1emotion change detection: 1
Most Publications2013: 202019: 182020: 132011: 132018: 12


SpeechComm2021 Tharshini Gunendradasan, Eliathamby Ambikairajah, Julien Epps, Vidhyasaharan Sethu, Haizhou Li 0001, 
An adaptive transmission line cochlear model based front-end for replay attack detection.

SpeechComm2021 Brian Stasak, Julien Epps, Heather T. Schatten, Ivan W. Miller, Emily Mower Provost, Michael F. Armey, 
Read speech voice quality and disfluency in individuals with recent suicidal ideation or suicide attempt.

ICASSP2021 Brian Stasak, Zhaocheng Huang, Dale Joachim, Julien Epps
Automatic Elicitation Compliance for Short-Duration Speech Based Depression Detection.

Interspeech2021 Beena Ahmed, Kirrie J. Ballard, Denis Burnham, Tharmakulasingam Sirojan, Hadi Mehmood, Dominique Estival, Elise Baker, Felicity Cox, Joanne Arciuli, Titia Benders, Katherine Demuth, Barbara Kelly, Chloé Diskin-Holdaway, Mostafa Ali Shahin, Vidhyasaharan Sethu, Julien Epps, Chwee Beng Lee, Eliathamby Ambikairajah, 
AusKidTalk: An Auditory-Visual Corpus of 3- to 12-Year-Old Australian Children's Speech.

SpeechComm2020 Brian Stasak, Julien Epps, Roland Goecke, 
Automatic depression classification based on affective read sentences: Opportunities for text-dependent analysis.

ICASSP2020 Zhaocheng Huang, Julien Epps, Dale Joachim, 
Exploiting Vocal Tract Coordination Using Dilated CNNS For Depression Detection In Naturalistic Environments.

Interspeech2020 Zhaocheng Huang, Julien Epps, Dale Joachim, Brian Stasak, James R. Williamson, Thomas F. Quatieri, 
Domain Adaptation for Enhancing Speech-Based Depression Detection in Natural Environmental Conditions Using Dilated CNNs.

Interspeech2020 Sadari Jayawardena, Julien Epps, Zhaocheng Huang, 
How Ordinal Are Your Data?

Interspeech2020 Hang Li, Siyuan Chen, Julien Epps
Augmenting Turn-Taking Prediction with Wearable Eye Activity During Conversation.

Interspeech2020 Prasanth Parasu, Julien Epps, Kaavya Sriskandaraja, Gajan Suthokumar, 
Investigating Light-ResNet Architecture for Spoofing Detection Under Mismatched Conditions.

Interspeech2020 Mostafa Ali Shahin, Renée Lu, Julien Epps, Beena Ahmed, 
UNSW System Description for the Shared Task on Automatic Speech Recognition for Non-Native Children's Speech.

ICASSP2019 Tharshini Gunendradasan, Saad Irtza, Eliathamby Ambikairajah, Julien Epps
Transmission Line Cochlear Model Based AM-FM Features for Replay Attack Detection.

ICASSP2019 Zhaocheng Huang, Julien Epps, Dale Joachim, 
Speech Landmark Bigrams for Depression Detection from Naturalistic Smartphone Speech.

ICASSP2019 Buddhi Wickramasinghe, Eliathamby Ambikairajah, Julien Epps, Vidhyasaharan Sethu, Haizhou Li 0001, 
Auditory Inspired Spatial Differentiation for Replay Spoofing Attack Detection.

Interspeech2019 Tharshini Gunendradasan, Eliathamby Ambikairajah, Julien Epps, Haizhou Li 0001, 
An Adaptive-Q Cochlear Model for Replay Spoofing Detection.

Interspeech2019 Siddique Latif, Rajib Rana, Sara Khalifa, Raja Jurdak, Julien Epps
Direct Modelling of Speech Emotion from Raw Speech.

Interspeech2019 Buddhi Wickramasinghe, Eliathamby Ambikairajah, Julien Epps
Biologically Inspired Adaptive-Q Filterbanks for Replay Spoofing Attack Detection.

Interspeech2018 Mia Atcheson, Vidhyasaharan Sethu, Julien Epps
Demonstrating and Modelling Systematic Time-varying Annotator Disagreement in Continuous Emotion Annotation.

Interspeech2018 Tharshini Gunendradasan, Buddhi Wickramasinghe, Phu Ngoc Le, Eliathamby Ambikairajah, Julien Epps
Detection of Replay-Spoofing Attacks Using Frequency Modulation Features.

Interspeech2018 Zhaocheng Huang, Julien Epps, Dale Joachim, Michael Chen, 
Depression Detection from Short Utterances via Diverse Smartphones in Natural Environmental Conditions.

#96  | Atsunori Ogawa | Google Scholar   DBLP
VenuesInterspeech: 19ICASSP: 12TASLP: 2SpeechComm: 1
Years2022: 22021: 22020: 42019: 92018: 52017: 82016: 4
ISCA Sectionacoustic models for asr: 2novel models and training methods for asr: 1speech enhancement and intelligibility: 1noise reduction and intelligibility: 1lm adaptation, lexical units and punctuation: 1speech enhancement: 1asr for noisy and far-field speech: 1asr neural network architectures: 1speech and audio source separation and scene analysis: 1neural networks for language modeling: 1adjusting to speaker, accent, and domain: 1end-to-end speech recognition: 1speech-enhancement: 1noise robust and far-field asr: 1multi-channel speech enhancement: 1acoustic model adaptation: 1speech enhancement and noise reduction: 1far-field, robustness and adaptation: 1
IEEE Keywordspeech recognition: 9natural language processing: 6speaker recognition: 3text analysis: 2automatic speech recognition: 2estimation theory: 2recurrent neural nets: 2deep neural network: 2domain adaptation: 2speech enhancement: 2language translation: 1sensor fusion: 1speech summarization: 1attention fusion: 1speech translation: 1rover: 1imbalanced datasets: 1confidence estimation: 1auxiliary features: 1bidirectional long short term memory (blstm): 1end to end (e2e) speech recognition: 1cluster voting: 1speaker clustering: 1age and gender estimation: 1i vectors: 1adversarial learning: 1speaker embedding: 1phoneme invariant feature: 1text independent speaker recognition: 1signal classification: 1deep neural networks: 1topic model: 1recurrent neural network language model: 1sequence summary network: 1decoding: 1encoder decoder: 1semi supervised learning: 1autoencoder: 1encoding: 1speech synthesis: 1speech separation/extraction: 1speaker attention: 1blind source separation: 1neural network: 1linear programming: 1integer programming: 1integer linear programming (ilp): 1oracle (upper bound) performance: 1maximum coverage of content words: 1compressive speech summarization: 1acoustic modeling: 1adaptive training: 1acoustic model adaptation: 1auxiliary feature: 1feedforward neural nets: 1robust asr: 1speaker extraction: 1speech mixtures: 1adaptation: 1language models: 1data augmentation: 1recurrent neural network: 1convolution: 1context adaptation: 1spatial diffuseness features: 1cnn based acoustic model: 1environmental robustness: 1inverse problems: 1gaussian processes: 1conditional density: 1model based feature enhancement: 1mixture models: 1mixture density network: 1probability: 1conditional random fields: 1word alignment network: 1correlation methods: 1error type classification: 1recognition accuracy estimation: 1
Most Publications2017: 152019: 102021: 92018: 92013: 9

Affiliations
URLs

ICASSP2022 Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Shinji Watanabe 0001, 
Integrating Multiple ASR Systems into NLP Backend with Attention Fusion.

Interspeech2022 Koharu Horii, Meiko Fukuda, Kengo Ohta, Ryota Nishimura, Atsunori Ogawa, Norihide Kitaoka, 
End-to-End Spontaneous Speech Recognition Using Disfluency Labeling.

ICASSP2021 Atsunori Ogawa, Naohiro Tawara, Takatomo Kano, Marc Delcroix, 
BLSTM-Based Confidence Estimation for End-to-End Speech Recognition.

Interspeech2021 Ayako Yamamoto, Toshio Irino, Kenichi Arai, Shoko Araki, Atsunori Ogawa, Keisuke Kinoshita, Tomohiro Nakatani, 
Comparison of Remote Experiments Using Crowdsourcing and Laboratory Experiments on Speech Intelligibility.

ICASSP2020 Naohiro Tawara, Hosana Kamiyama, Satoshi Kobashikawa, Atsunori Ogawa
Improving Speaker-Attribute Estimation by Voting Based on Speaker Cluster Information.

ICASSP2020 Naohiro Tawara, Atsunori Ogawa, Tomoharu Iwata, Marc Delcroix, Tetsuji Ogawa, 
Frame-Level Phoneme-Invariant Speaker Embedding for Text-Independent Speaker Recognition on Extremely Short Utterances.

Interspeech2020 Kenichi Arai, Shoko Araki, Atsunori Ogawa, Keisuke Kinoshita, Tomohiro Nakatani, Toshio Irino, 
Predicting Intelligibility of Enhanced Speech Using Posteriors Derived from DNN-Based ASR System.

Interspeech2020 Atsunori Ogawa, Naohiro Tawara, Marc Delcroix, 
Language Model Data Augmentation Based on Text Domain Transfer.

ICASSP2019 Michael Hentschel, Marc Delcroix, Atsunori Ogawa, Tomoharu Iwata, Tomohiro Nakatani, 
A Unified Framework for Feature-based Domain Adaptation of Neural Network Language Models.

ICASSP2019 Shigeki Karita, Shinji Watanabe 0001, Tomoharu Iwata, Marc Delcroix, Atsunori Ogawa, Tomohiro Nakatani, 
Semi-supervised End-to-end Speech Recognition Using Text-to-speech and Autoencoders.

ICASSP2019 Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Atsunori Ogawa, Tomohiro Nakatani, 
A Unified Framework for Neural Speech Separation and Extraction.

ICASSP2019 Atsunori Ogawa, Tsutomu Hirao, Tomohiro Nakatani, Masaaki Nagata, 
ILP-based Compressive Speech Summarization with Content Word Coverage Maximization and Its Oracle Performance Analysis.

Interspeech2019 Kenichi Arai, Shoko Araki, Atsunori Ogawa, Keisuke Kinoshita, Tomohiro Nakatani, Katsuhiko Yamamoto, Toshio Irino, 
Predicting Speech Intelligibility of Enhanced Speech Using Phone Accuracy of DNN-Based ASR System.

Interspeech2019 Marc Delcroix, Shinji Watanabe 0001, Tsubasa Ochiai, Keisuke Kinoshita, Shigeki Karita, Atsunori Ogawa, Tomohiro Nakatani, 
End-to-End SpeakerBeam for Single Channel Target Speech Recognition.

Interspeech2019 Shigeki Karita, Nelson Enrique Yalta Soplin, Shinji Watanabe 0001, Marc Delcroix, Atsunori Ogawa, Tomohiro Nakatani, 
Improving Transformer-Based End-to-End Speech Recognition with Connectionist Temporal Classification and Language Model Integration.

Interspeech2019 Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Atsunori Ogawa, Tomohiro Nakatani, 
Multimodal SpeakerBeam: Single Channel Target Speech Extraction with Audio-Visual Speaker Clues.

Interspeech2019 Atsunori Ogawa, Marc Delcroix, Shigeki Karita, Tomohiro Nakatani, 
Improved Deep Duel Model for Rescoring N-Best Speech Recognition List Using Backward LSTMLM and Ensemble Encoders.

TASLP2018 Marc Delcroix, Keisuke Kinoshita, Atsunori Ogawa, Christian Huemmer 0001, Tomohiro Nakatani, 
Context Adaptive Neural Network Based Acoustic Models for Rapid Adaptation.

ICASSP2018 Marc Delcroix, Katerina Zmolíková, Keisuke Kinoshita, Atsunori Ogawa, Tomohiro Nakatani, 
Single Channel Target Speaker Extraction and Recognition with Speaker Beam.

ICASSP2018 Tsuyoshi Morioka, Naohiro Tawara, Tetsuji Ogawa, Atsunori Ogawa, Tomoharu Iwata, Tetsunori Kobayashi, 
Language Model Domain Adaptation Via Recurrent Neural Networks with Domain-Shared and Domain-Specific Representations.

#97  | Nicholas W. D. Evans | Google Scholar   DBLP
VenuesInterspeech: 25ICASSP: 8TASLP: 1
Years2022: 32021: 82020: 62019: 42018: 62017: 32016: 4
ISCA Sectionvoice anti-spoofing and countermeasure: 3voice privacy challenge: 3speaker recognition: 3special session: 2spoofing-aware automatic speaker verification (sasv): 1robust speaker recognition: 1privacy-preserving machine learning for audio & speech processing: 1the first dicova challenge: 1graph and end-to-end learning for speaker recognition: 1anti-spoofing and liveness detection: 1privacy in speech and audio interfaces: 1the 2019 automatic speaker verification spoofing and countermeasures challenge: 1novel approaches to enhancement: 1spoken corpora and annotation: 1the first dihard speech diarization challenge: 1speaker verification: 1speaker recognition evaluation: 1robust speaker recognition and anti-spoofing: 1
IEEE Keywordspeaker recognition: 4presentation attack detection: 3artificial bandwidth extension: 3speech quality: 3anti spoofing: 2automatic speaker verification: 2spoofing: 2security of data: 2variational auto encoder: 2latent variable: 2speech coding: 2graph theory: 1end to end: 1graph attention networks: 1audio spoofing detection: 1heterogeneous: 1data augmentation: 1filtering theory: 1transient response: 1signal classification: 1countermeasures: 1public domain software: 1spoofing counter measures: 1automatic speaker verification (asv): 1detect ion cost function: 1mean square error methods: 1generative adversarial network: 1statistical distributions: 1telephony: 1speech recognition: 1regression analysis: 1dimensionality reduction: 1bandwidth extension: 1speech codecs: 1voice quality: 1spectral analysis: 1super wideband: 1information theory: 1computational complexity: 1gaussian mixture model: 1replay: 1speaker verification: 1
Most Publications2021: 242022: 172020: 172019: 152018: 15


ICASSP2022 Jee-weon Jung, Hee-Soo Heo, Hemlata Tak, Hye-jin Shim, Joon Son Chung, Bong-Jin Lee, Ha-Jin Yu, Nicholas W. D. Evans
AASIST: Audio Anti-Spoofing Using Integrated Spectro-Temporal Graph Attention Networks.

ICASSP2022 Hemlata Tak, Madhu R. Kamble, Jose Patino 0001, Massimiliano Todisco, Nicholas W. D. Evans
Rawboost: A Raw Data Boosting and Augmentation Method Applied to Automatic Speaker Verification Anti-Spoofing.

Interspeech2022 Jee-weon Jung, Hemlata Tak, Hye-jin Shim, Hee-Soo Heo, Bong-Jin Lee, Soo-Whan Chung, Ha-Jin Yu, Nicholas W. D. Evans, Tomi Kinnunen, 
SASV 2022: The First Spoofing-Aware Speaker Verification Challenge.

ICASSP2021 Hemlata Tak, Jose Patino 0001, Massimiliano Todisco, Andreas Nautsch, Nicholas W. D. Evans, Anthony Larcher, 
End-to-End anti-spoofing with RawNet2.

Interspeech2021 Jose Patino 0001, Natalia A. Tomashenko, Massimiliano Todisco, Andreas Nautsch, Nicholas W. D. Evans
Speaker Anonymisation Using the McAdams Coefficient.

Interspeech2021 Oubaïda Chouchane, Baptiste Brossier, Jorge Esteban Gamboa Gamboa, Thomas Lardy, Hemlata Tak, Orhan Ermis, Madhu R. Kamble, Jose Patino 0001, Nicholas W. D. Evans, Melek Önen, Massimiliano Todisco, 
Privacy-Preserving Voice Anti-Spoofing Using Secure Multi-Party Computation.

Interspeech2021 Wanying Ge, Michele Panariello, Jose Patino 0001, Massimiliano Todisco, Nicholas W. D. Evans
Partially-Connected Differentiable Architecture Search for Deepfake and Spoofing Detection.

Interspeech2021 Madhu R. Kamble, José Andrés González López, Teresa Grau, Juan M. Espín, Lorenzo Cascioli, Yiqing Huang, Alejandro Gomez-Alanis, Jose Patino 0001, Roberto Font, Antonio M. Peinado, Angel M. Gomez, Nicholas W. D. Evans, Maria A. Zuluaga, Massimiliano Todisco, 
PANACEA Cough Sound-Based Diagnosis of COVID-19 for the DiCOVA 2021 Challenge.

Interspeech2021 Tomi Kinnunen, Andreas Nautsch, Md. Sahidullah, Nicholas W. D. Evans, Xin Wang 0037, Massimiliano Todisco, Héctor Delgado, Junichi Yamagishi, Kong Aik Lee, 
Visualizing Classifier Adjacency Relations: A Case Study in Speaker Verification and Voice Anti-Spoofing.

Interspeech2021 Hemlata Tak, Jee-weon Jung, Jose Patino 0001, Massimiliano Todisco, Nicholas W. D. Evans
Graph Attention Networks for Anti-Spoofing.

Interspeech2021 Lin Zhang, Xin Wang 0037, Erica Cooper, Junichi Yamagishi, Jose Patino 0001, Nicholas W. D. Evans
An Initial Investigation for Detecting Partially Spoofed Audio.

TASLP2020 Tomi Kinnunen, Héctor Delgado, Nicholas W. D. Evans, Kong Aik Lee, Ville Vestman, Andreas Nautsch, Massimiliano Todisco, Xin Wang 0037, Md. Sahidullah, Junichi Yamagishi, Douglas A. Reynolds, 
Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification: Fundamentals.

ICASSP2020 Pramod B. Bachhav, Massimiliano Todisco, Nicholas W. D. Evans
Artificial Bandwidth Extension Using Conditional Variational Auto-encoders and Adversarial Learning.

Interspeech2020 Andreas Nautsch, Jose Patino 0001, Natalia A. Tomashenko, Junichi Yamagishi, Paul-Gauthier Noé, Jean-François Bonastre, Massimiliano Todisco, Nicholas W. D. Evans
The Privacy ZEBRA: Zero Evidence Biometric Recognition Assessment.

Interspeech2020 Paul-Gauthier Noé, Jean-François Bonastre, Driss Matrouf, Natalia A. Tomashenko, Andreas Nautsch, Nicholas W. D. Evans
Speech Pseudonymisation Assessment Using Voice Similarity Matrices.

Interspeech2020 Hemlata Tak, Jose Patino 0001, Andreas Nautsch, Nicholas W. D. Evans, Massimiliano Todisco, 
Spoofing Attack Detection Using the Non-Linear Fusion of Sub-Band Classifiers.

Interspeech2020 Natalia A. Tomashenko, Brij Mohan Lal Srivastava, Xin Wang 0037, Emmanuel Vincent 0001, Andreas Nautsch, Junichi Yamagishi, Nicholas W. D. Evans, Jose Patino 0001, Jean-François Bonastre, Paul-Gauthier Noé, Massimiliano Todisco, 
Introducing the VoicePrivacy Initiative.

ICASSP2019 Pramod B. Bachhav, Massimiliano Todisco, Nicholas W. D. Evans
Latent Representation Learning for Artificial Bandwidth Extension Using a Conditional Variational Auto-encoder.

Interspeech2019 Andreas Nautsch, Jose Patino 0001, Amos Treiber, Themos Stafylakis, Petr Mizera, Massimiliano Todisco, Thomas Schneider 0003, Nicholas W. D. Evans
Privacy-Preserving Speaker Recognition with Cohort Score Normalisation.

Interspeech2019 Andreas Nautsch, Catherine Jasserand, Els Kindt, Massimiliano Todisco, Isabel Trancoso, Nicholas W. D. Evans
The GDPR & Speech Data: Reflections of Legal and Technology Communities, First Steps Towards a Common Understanding.

#98  | Mads Græsbøll Christensen | Google Scholar   DBLP
VenuesICASSP: 20TASLP: 5Interspeech: 5SpeechComm: 4
Years2022: 22021: 42020: 42019: 62018: 72017: 52016: 6
ISCA Sectionspeech enhancement: 2speech signal analysis and representation: 1single-channel speech enhancement: 1pathological speech and language: 1
IEEE Keywordspeech enhancement: 14filtering theory: 4bayes methods: 4matrix decomposition: 3eigenvalues and eigenfunctions: 3audio signal processing: 3kalman filter: 3noise psd estimation: 3autoregressive models: 3joint diagonalization: 3conjugate gradient methods: 2computational complexity: 2least mean squares methods: 2hidden markov models: 2signal denoising: 2acoustic variables control: 2distortion: 2speech intelligibility: 2hearing aids: 2kalman filters: 2pitch estimation: 2frequency estimation: 2estimation theory: 2wiener filters: 2covariance matrices: 2optimal filtering: 2causality: 1long term linear prediction: 1speech attenuation: 1anc headphones: 1fixed filter anc: 1adaptive filters: 1active noise control: 1feedforward: 1headphones: 1deep representation learning: 1deep learning (artificial intelligence): 1variational autoencoder: 1bayesian permutation training: 1subspace based approach: 1generalized eigenvalue decomposition: 1personal sound zones: 1conjugate gradient method: 1physically meaningful constraints: 1loudspeakers: 1reverberation: 1poisson mixture model (pmm): 1poisson distribution: 1hidden markov model (hmm): 1mixture models: 1minimum mean square error (mmse): 1non negative matrix factorization (nmf): 1generalized analysis by synthesis: 1levinson durbin recursion: 1autoregressive processes: 1parameter estimation: 1auto regressive model: 1dnn: 1conjugate gradient: 1reduced rank: 1variable span trade off filter: 1sound zone control: 1binaural enhancement: 1maximum likelihood estimation: 1autoregressive model: 1markov process: 1fundamental frequency or pitch tracking: 1harmonic model: 1voiced unvoiced detection: 1correlation methods: 1harmonic order: 1markov processes: 1tracking: 1fundamental frequency: 1gross error rate: 1spectral flatness measure: 1pre whitening: 1bayesian nonparametric: 1pattern classification: 1infinite hmm: 1diseases: 1parkinson’s disease: 1patient monitoring: 1segmentation: 1signal classification: 1quality control: 1recursive estimation: 1spectral analysis: 1acoustic field: 1sound reproduction: 1sound zones: 1acoustic distortion: 1variable span linear filter: 1personal sound: 1probability: 1noise statistics: 1medical signal processing: 1gaussian processes: 1remote pathological voice analysis: 1distortion modeling: 1channel factors: 1support vector machines: 1plda: 1svm: 1regression analysis: 1mfcc: 1support vector regression: 1global snr estimation: 1cepstral analysis: 1pathological voice: 1whispered speech: 1speech recognition: 1clustering structured sparsity: 1multi pitch estimation: 1subharmonic errors: 1block sparse bayesian learning: 1multichannel speech enhancement: 1doa mismatch: 1mean square error methods: 1mmse filtering: 1harmonic filters: 1voiced speech: 1microphone arrays: 1least 1 norm cost function: 1speech analysis: 1deconvolution: 1poles and zeros: 1pole zero model: 1sparse deconvolution: 1non intrusive objective intelligibility prediction: 1optimal filters: 1subspace: 1noise reduction: 1span: 1harmonic signal model: 1non stationary speech: 1chirp model: 1babble noise: 1subspace signal processing: 1cocktail party: 1multichannel enhancement: 1tradeoff filter: 1
Most Publications2019: 212016: 212014: 202022: 182013: 18


ICASSP2022 Yurii Iotov, Sidsel Marie Nørholm, Valiantsin Belyi, Mads Dyrholm, Mads Græsbøll Christensen
Computationally Efficient Fixed-Filter ANC for Speech Based on Long-Term Prediction for Headphone Applications.

ICASSP2022 Yang Xiang, Jesper Lisby Højvang, Morten Højfeldt Rasmussen, Mads Græsbøll Christensen
A Bayesian Permutation Training Deep Representation Learning Method for Speech Enhancement with Variational Autoencoder.

SpeechComm2021 Amir Hossein Poorjam, Mathew Shaji Kavalekalam, Liming Shi, Yordan P. Raykov, Jesper Rindom Jensen, Max A. Little, Mads Græsbøll Christensen
Automatic quality control and enhancement for voice-based remote Parkinson's disease detection.

TASLP2021 Liming Shi, Taewoong Lee, Lijun Zhang 0004, Jesper Kjær Nielsen, Mads Græsbøll Christensen
Generation of Personal Sound Zones With Physical Meaningful Constraints and Conjugate Gradient Method.

ICASSP2021 Yang Xiang, Liming Shi, Jesper Lisby Højvang, Morten Højfeldt Rasmussen, Mads Græsbøll Christensen
A Novel NMF-HMM Speech Enhancement Algorithm Based on Poisson Mixture Model.

Interspeech2021 Alfredo Esquivel Jaramillo, Jesper Kjær Nielsen, Mads Græsbøll Christensen
Speech Decomposition Based on a Hybrid Speech Model and Optimal Segmentation.

SpeechComm2020 Jesper Rindom Jensen, Sam Karimian-Azari, Mads Græsbøll Christensen, Jacob Benesty, 
Harmonic beamformers for speech enhancement and dereverberation in the time domain.

ICASSP2020 Zihao Cui, Changchun Bao, Jesper Kjær Nielsen, Mads Græsbøll Christensen
Autoregressive Parameter Estimation with Dnn-Based Pre-Processing.

ICASSP2020 Liming Shi, Taewoong Lee, Lijun Zhang, Jesper Kjær Nielsen, Mads Græsbøll Christensen
A Fast Reduced-Rank Sound Zone Control Algorithm Using The Conjugate Gradient Method.

Interspeech2020 Yang Xiang, Liming Shi, Jesper Lisby Højvang, Morten Højfeldt Rasmussen, Mads Græsbøll Christensen
An NMF-HMM Speech Enhancement Method Based on Kullback-Leibler Divergence.

TASLP2019 Mathew Shaji Kavalekalam, Jesper Kjær Nielsen, Jesper Bünsow Boldt, Mads Græsbøll Christensen
Model-Based Speech Enhancement for Intelligibility Improvement in Binaural Hearing Aids.

TASLP2019 Liming Shi, Jesper Kjær Nielsen, Jesper Rindom Jensen, Max A. Little, Mads Græsbøll Christensen
Robust Bayesian Pitch Tracking Based on the Harmonic Model.

ICASSP2019 Alfredo Esquivel Jaramillo, Jesper Kjær Nielsen, Mads Græsbøll Christensen
A Study on How Pre-whitening Influences Fundamental Frequency Estimation.

ICASSP2019 Amir Hossein Poorjam, Yordan P. Raykov, Reham Badawy, Jesper Rindom Jensen, Mads Græsbøll Christensen, Max A. Little, 
Quality Control of Voice Recordings in Remote Parkinson's Disease Monitoring Using the Infinite Hidden Markov Model.

Interspeech2019 Charlotte Sørensen, Jesper Bünsow Boldt, Mads Græsbøll Christensen
Harmonic Beamformers for Non-Intrusive Speech Intelligibility Prediction.

Interspeech2019 Charlotte Sørensen, Jesper Bünsow Boldt, Mads Græsbøll Christensen
Validation of the Non-Intrusive Codebook-Based Short Time Objective Intelligibility Metric for Processed Speech.

SpeechComm2018 Charlotte Sorensen, Mathew Shaji Kavalekalam, Angeliki Xenaki, Jesper Bünsow Boldt, Mads Græsbøll Christensen
Non-intrusive codebook-based intelligibility prediction.

ICASSP2018 Mathew Shaji Kavalekalam, Jesper Kjær Nielsen, Mads Græsbøll Christensen, Jesper Bünsow Boldt, 
A Study of Noise PSD Estimators for Single Channel Speech Enhancement.

ICASSP2018 Taewoong Lee, Jesper Kjær Nielsen, Jesper Rindom Jensen, Mads Græsbøll Christensen
A Unified Approach to Generating Sound Zones Using Variable Span Linear Filters.

ICASSP2018 Jesper Kjær Nielsen, Mathew Shaji Kavalekalam, Mads Græsbøll Christensen, Jesper Bünsow Boldt, 
Model-Based Noise PSD Estimation from Speech in Non-Stationary Noise.

#99  | Yonghui Wu | Google Scholar   DBLP
VenuesICASSP: 15Interspeech: 15ICLR: 2ICML: 1NeurIPS: 1
Years2022: 32021: 62020: 82019: 92018: 62017: 2
ISCA Sectionspeech synthesis: 6asr neural network architectures: 2speech translation: 2asr neural network architectures and training: 1training strategies for asr: 1cross-lingual and multilingual asr: 1application of asr in medical practice: 1end-to-end speech recognition: 1
IEEE Keywordspeech recognition: 11speech synthesis: 6recurrent neural nets: 6speech coding: 4natural language processing: 3data augmentation: 3speaker recognition: 2conformer: 2latency: 2rnn t: 2optimisation: 2regression analysis: 2probability: 2end to end speech recognition: 2vocabulary: 2tacotron 2: 2text to speech: 2rnnt: 1two pass asr: 1long form asr: 1end to end asr: 1iterative methods: 1self attention: 1vae: 1non autoregressive: 1text analysis: 1autoregressive processes: 1computational complexity: 1neural tts: 1cascaded encoders: 1endpointer: 1multi domain training: 1fine grained vae: 1hierarchical: 1mobile handsets: 1variational autoencoder: 1adversarial training: 1text to speech synthesis: 1multilingual: 1end to end speech synthesis: 1decoding: 1neural net architecture: 1wavenet: 1vocoders: 1waveform analysis: 1
Most Publications2020: 352019: 352022: 322021: 302018: 30

Affiliations
URLs

ICASSP2022 Tara N. Sainath, Yanzhang He, Arun Narayanan, Rami Botros, Weiran Wang, David Qiu, Chung-Cheng Chiu, Rohit Prabhavalkar, Alexander Gruenstein, Anmol Gulati, Bo Li 0028, David Rybach, Emmanuel Guzman, Ian McGraw, James Qin, Krzysztof Choromanski, Qiao Liang, Robert David, Ruoming Pang, Shuo-Yiin Chang, Trevor Strohman, W. Ronny Huang, Wei Han 0002, Yonghui Wu, Yu Zhang 0033, 
Improving The Latency And Quality Of Cascaded Encoders.

Interspeech2022 Lev Finkelstein, Heiga Zen, Norman Casagrande, Chun-an Chan, Ye Jia, Tom Kenter, Alexey Petelin, Jonathan Shen, Vincent Wan, Yu Zhang 0033, Yonghui Wu, Rob Clark, 
Training Text-To-Speech Systems From Synthetic Data: A Practical Approach For Accent Transfer Tasks.

ICML2022 Chung-Cheng Chiu, James Qin, Yu Zhang, Jiahui Yu, Yonghui Wu
Self-supervised learning with random-projection quantizer for speech recognition.

ICASSP2021 Isaac Elias, Heiga Zen, Jonathan Shen, Yu Zhang 0033, Ye Jia, Ron J. Weiss, Yonghui Wu
Parallel Tacotron: Non-Autoregressive and Controllable TTS.

ICASSP2021 Bo Li 0028, Anmol Gulati, Jiahui Yu, Tara N. Sainath, Chung-Cheng Chiu, Arun Narayanan, Shuo-Yiin Chang, Ruoming Pang, Yanzhang He, James Qin, Wei Han 0002, Qiao Liang, Yu Zhang 0033, Trevor Strohman, Yonghui Wu
A Better and Faster end-to-end Model for Streaming ASR.

ICASSP2021 Jiahui Yu, Chung-Cheng Chiu, Bo Li 0028, Shuo-Yiin Chang, Tara N. Sainath, Yanzhang He, Arun Narayanan, Wei Han 0002, Anmol Gulati, Yonghui Wu, Ruoming Pang, 
FastEmit: Low-Latency Streaming ASR with Sequence-Level Emission Regularization.

Interspeech2021 Isaac Elias, Heiga Zen, Jonathan Shen, Yu Zhang 0033, Ye Jia, R. J. Skerry-Ryan, Yonghui Wu
Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling.

Interspeech2021 Ye Jia, Heiga Zen, Jonathan Shen, Yu Zhang 0033, Yonghui Wu
PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS.

ICLR2021 Jiahui Yu, Wei Han 0002, Anmol Gulati, Chung-Cheng Chiu, Bo Li 0028, Tara N. Sainath, Yonghui Wu, Ruoming Pang, 
Dual-mode ASR: Unify and Improve Streaming ASR with Full-context Modeling.

ICASSP2020 Bo Li 0028, Shuo-Yiin Chang, Tara N. Sainath, Ruoming Pang, Yanzhang He, Trevor Strohman, Yonghui Wu
Towards Fast and Accurate Streaming End-To-End ASR.

ICASSP2020 Daniel S. Park, Yu Zhang 0033, Chung-Cheng Chiu, Youzheng Chen, Bo Li 0028, William Chan, Quoc V. Le, Yonghui Wu
Specaugment on Large Scale Datasets.

ICASSP2020 Tara N. Sainath, Yanzhang He, Bo Li 0028, Arun Narayanan, Ruoming Pang, Antoine Bruguier, Shuo-Yiin Chang, Wei Li 0133, Raziel Alvarez, Zhifeng Chen, Chung-Cheng Chiu, David Garcia, Alexander Gruenstein, Ke Hu, Anjuli Kannan, Qiao Liang, Ian McGraw, Cal Peyser, Rohit Prabhavalkar, Golan Pundak, David Rybach, Yuan Shangguan, Yash Sheth, Trevor Strohman, Mirkó Visontai, Yonghui Wu, Yu Zhang 0033, Ding Zhao, 
A Streaming On-Device End-To-End Model Surpassing Server-Side Conventional Model Quality and Latency.

ICASSP2020 Guangzhi Sun, Yu Zhang 0033, Ron J. Weiss, Yuan Cao 0007, Heiga Zen, Yonghui Wu
Fully-Hierarchical Fine-Grained Prosody Modeling For Interpretable Speech Synthesis.

ICASSP2020 Gary Wang, Andrew Rosenberg, Zhehuai Chen, Yu Zhang 0033, Bhuvana Ramabhadran, Yonghui Wu, Pedro J. Moreno 0001, 
Improving Speech Recognition Using Consistent Predictions on Synthesized Speech.

Interspeech2020 Anmol Gulati, James Qin, Chung-Cheng Chiu, Niki Parmar, Yu Zhang 0033, Jiahui Yu, Wei Han 0002, Shibo Wang, Zhengdong Zhang, Yonghui Wu, Ruoming Pang, 
Conformer: Convolution-augmented Transformer for Speech Recognition.

Interspeech2020 Wei Han 0002, Zhengdong Zhang, Yu Zhang 0033, Jiahui Yu, Chung-Cheng Chiu, James Qin, Anmol Gulati, Ruoming Pang, Yonghui Wu
ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context.

Interspeech2020 Daniel S. Park, Yu Zhang 0033, Ye Jia, Wei Han 0002, Chung-Cheng Chiu, Bo Li 0028, Yonghui Wu, Quoc V. Le, 
Improved Noisy Student Training for Automatic Speech Recognition.

ICASSP2019 Yanzhang He, Tara N. Sainath, Rohit Prabhavalkar, Ian McGraw, Raziel Alvarez, Ding Zhao, David Rybach, Anjuli Kannan, Yonghui Wu, Ruoming Pang, Qiao Liang, Deepti Bhatia, Yuan Shangguan, Bo Li 0028, Golan Pundak, Khe Chai Sim, Tom Bagby, Shuo-Yiin Chang, Kanishka Rao, Alexander Gruenstein, 
Streaming End-to-end Speech Recognition for Mobile Devices.

ICASSP2019 Wei-Ning Hsu, Yu Zhang 0033, Ron J. Weiss, Yu-An Chung, Yuxuan Wang 0002, Yonghui Wu, James R. Glass, 
Disentangling Correlated Speaker and Noise for Speech Synthesis via Data Augmentation and Adversarial Factorization.

ICASSP2019 Bo Li 0028, Yu Zhang 0033, Tara N. Sainath, Yonghui Wu, William Chan, 
Bytes Are All You Need: End-to-end Multilingual Speech Recognition and Synthesis with Bytes.

#100  | Eliathamby Ambikairajah | Google Scholar   DBLP
VenuesInterspeech: 23ICASSP: 8SpeechComm: 2
Years2021: 32020: 22019: 62018: 82017: 72016: 7
ISCA Sectionspoofing detection: 4language recognition: 3speaker recognition and anti-spoofing: 2tools, corpora and resources: 1emotion and sentiment analysis: 1training strategy for speech emotion recognition: 1language identification: 1speech analysis and representation: 1emotion modeling: 1speaker and language recognition applications: 1speaker recognition evaluation: 1short utterances speaker recognition: 1speaker database and anti-spoofing: 1behavioral signal processing and speaker state and traits analytics: 1special session: 1speaker recognition: 1robust speaker recognition and anti-spoofing: 1
IEEE Keywordspeaker recognition: 5replay attack: 2speaker verification: 2channel bank filters: 2automatic speaker verification: 2language identification: 2i vector: 2educational courses: 1computer aided instruction: 1electrical engineering education: 1electrical engineering computing: 1dsp education: 1cochlear models: 1filter bank: 1project based learning: 1replay detection: 1voice biometrics: 1biometrics (access control): 1adversarial networks: 1multi task deep learning: 1speaker normalization: 1speech recognition: 1digital filters: 1transmission line cochlear model: 1frequency modulation: 1amplitude modulation: 1spoofing: 1phoneme posterior weighted score: 1phoneme detection: 1spoofing detection: 1band pass filters: 1anti spoofing: 1asvspoof 2017: 1spatial differentiation: 1iir filters: 1security of data: 1signal detection: 1bidirectional lstm: 1dnn adaptation: 1recurrent neural nets: 1audio recording: 1factorized hidden variability learning: 1probability: 1normal distribution: 1short duration speaker verification: 1phonetic variability: 1speaker phonetic vector: 1parameter estimation: 1pattern classification: 1pllr: 1natural language processing: 1hierarchical framework: 1
Most Publications2008: 162007: 162018: 152017: 142019: 12

Affiliations
URLs

SpeechComm2021 Tharshini Gunendradasan, Eliathamby Ambikairajah, Julien Epps, Vidhyasaharan Sethu, Haizhou Li 0001, 
An adaptive transmission line cochlear model based front-end for replay attack detection.

Interspeech2021 Beena Ahmed, Kirrie J. Ballard, Denis Burnham, Tharmakulasingam Sirojan, Hadi Mehmood, Dominique Estival, Elise Baker, Felicity Cox, Joanne Arciuli, Titia Benders, Katherine Demuth, Barbara Kelly, Chloé Diskin-Holdaway, Mostafa Ali Shahin, Vidhyasaharan Sethu, Julien Epps, Chwee Beng Lee, Eliathamby Ambikairajah
AusKidTalk: An Auditory-Visual Corpus of 3- to 12-Year-Old Australian Children's Speech.

Interspeech2021 Deboshree Bose, Vidhyasaharan Sethu, Eliathamby Ambikairajah
Parametric Distributions to Model Numerical Emotion Labels.

ICASSP2020 Eliathamby Ambikairajah, Vidhyasaharan Sethu, 
Cochlear Signal Processing: A Platform for Learning the Fundamentals of Digital Signal Processing.

ICASSP2020 Gajan Suthokumar, Vidhyasaharan Sethu, Kaavya Sriskandaraja, Eliathamby Ambikairajah
Adversarial Multi-Task Learning for Speaker Normalization in Replay Detection.

ICASSP2019 Tharshini Gunendradasan, Saad Irtza, Eliathamby Ambikairajah, Julien Epps, 
Transmission Line Cochlear Model Based AM-FM Features for Replay Attack Detection.

ICASSP2019 Gajan Suthokumar, Kaavya Sriskandaraja, Vidhyasaharan Sethu, Chamith Wijenayake, Eliathamby Ambikairajah
Phoneme Specific Modelling and Scoring Techniques for Anti Spoofing System.

ICASSP2019 Buddhi Wickramasinghe, Eliathamby Ambikairajah, Julien Epps, Vidhyasaharan Sethu, Haizhou Li 0001, 
Auditory Inspired Spatial Differentiation for Replay Spoofing Attack Detection.

Interspeech2019 Tharshini Gunendradasan, Eliathamby Ambikairajah, Julien Epps, Haizhou Li 0001, 
An Adaptive-Q Cochlear Model for Replay Spoofing Detection.

Interspeech2019 Anda Ouyang, Ting Dang, Vidhyasaharan Sethu, Eliathamby Ambikairajah
Speech Based Emotion Prediction: Can a Linear Model Work?

Interspeech2019 Buddhi Wickramasinghe, Eliathamby Ambikairajah, Julien Epps, 
Biologically Inspired Adaptive-Q Filterbanks for Replay Spoofing Attack Detection.

SpeechComm2018 Saad Irtza, Vidhyasaharan Sethu, Eliathamby Ambikairajah, Haizhou Li 0001, 
Using language cluster models in hierarchical language identification.

ICASSP2018 Sarith Fernando, Vidhyasaharan Sethu, Eliathamby Ambikairajah
Factorized Hidden Variability Learning for Adaptation of Short Duration Language Identification Models.

ICASSP2018 Jianbo Ma, Vidhyasaharan Sethu, Eliathamby Ambikairajah, Kong-Aik Lee, 
Speaker-Phonetic Vector Estimation for Short Duration Speaker Verification.

Interspeech2018 Sarith Fernando, Vidhyasaharan Sethu, Eliathamby Ambikairajah
Sub-band Envelope Features Using Frequency Domain Linear Prediction for Short Duration Language Identification.

Interspeech2018 Tharshini Gunendradasan, Buddhi Wickramasinghe, Phu Ngoc Le, Eliathamby Ambikairajah, Julien Epps, 
Detection of Replay-Spoofing Attacks Using Frequency Modulation Features.

Interspeech2018 Kaavya Sriskandaraja, Vidhyasaharan Sethu, Eliathamby Ambikairajah
Deep Siamese Architecture Based Replay Detection for Secure Voice Biometric.

Interspeech2018 Gajan Suthokumar, Vidhyasaharan Sethu, Chamith Wijenayake, Eliathamby Ambikairajah
Modulation Dynamic Features for the Detection of Replay Attacks.

Interspeech2018 Buddhi Wickramasinghe, Saad Irtza, Eliathamby Ambikairajah, Julien Epps, 
Frequency Domain Linear Prediction Features for Replay Spoofing Attack Detection.

Interspeech2017 Siyuan Chen, Julien Epps, Eliathamby Ambikairajah, Phu Ngoc Le, 
An Investigation of Crowd Speech for Room Occupancy Estimation.

#101  | Peter Birkholz | Google Scholar   DBLP
VenuesInterspeech: 21ICASSP: 5SpeechComm: 4TASLP: 3
Years2023: 12022: 122021: 42020: 62019: 32018: 22017: 32016: 2
ISCA Sectionspeech synthesis: 3speech production: 3speech processing & measurement: 2show and tell: 2miscellaneous topics in speech, voice and hearing disorders: 1phonetics: 1human speech & signal processing: 1pathological speech assessment: 1tonal aspects of acoustic phonetics and prosody: 1language learning: 1special session: 1stance, credibility, and deception: 1speech analysis and representation: 1speech analysis: 1show & tell session 6: 1
IEEE Keywordspeech synthesis: 5articulatory synthesis: 2natural language processing: 2silent speech: 2mean square error methods: 2speech recognition: 2pattern clustering: 1german phonology: 1r allophones in german: 1acoustic resonances: 1vocal tract walls: 1transfer function measurement: 1acoustic resonance: 1voice activity detection: 1silicon: 1carina: 1data handling: 1prosodic annotation: 1speech data: 1co intrinsic f0 variation: 1intrinsic f0 variation: 1approximation theory: 1target approximation model: 1pitch modeling: 1acoustic variables measurement: 1ssi: 1electro optical stomatography: 1speaker recognition: 1eos: 1regression analysis: 1support vector machines: 1intonation modeling: 1articulation to speech synthesis: 1vivaldi antennas: 1nearest neighbour methods: 1silent speech interface: 1microwave detectors: 1speech production: 1vocal tract models: 1
Most Publications2022: 182021: 102020: 102018: 62017: 5

Affiliations
URLs

SpeechComm2023 Daniel R. van Niekerk, Anqi Xu, Branislav Gerazov, Paul Konstantin Krug, Peter Birkholz, Lorna F. Halliday, Santitham Prom-on, Yi Xu 0007, 
Simulating vocal learning of spoken language: Beyond imitation.

TASLP2022 Simon Stone, Yingming Gao, Peter Birkholz
Articulatory Synthesis of Vocalized /r/ Allophones in German.

ICASSP2022 Peter Birkholz, P. Häsner, Steffen Kürbis, 
Acoustic Comparison of Physical Vocal Tract Models with Hard and Soft Walls.

ICASSP2022 Hannes Kath, Simon Stone, Stefan Rapp, Peter Birkholz
Carina - A Corpus of Aligned German Read Speech Including Annotations.

Interspeech2022 Pouriya Amini Digehsara, João Vítor Possamai de Menezes, Christoph Wagner, Michael Bärhold, Petr Schaffer, Dirk Plettemeier, Peter Birkholz
A user-friendly headset for radar-based silent speech recognition.

Interspeech2022 Arne-Lukas Fietkau, Simon Stone, Peter Birkholz
Relationship between the acoustic time intervals and tongue movements of German diphthongs.

Interspeech2022 Paul Konstantin Krug, Peter Birkholz, Branislav Gerazov, Daniel Rudolph van Niekerk, Anqi Xu, Yi Xu, 
Articulatory Synthesis for Data Augmentation in Phoneme Recognition.

Interspeech2022 Ingo Langheinrich, Simon Stone, Xinyu Zhang, Peter Birkholz
Glottal inverse filtering based on articulatory synthesis and deep learning.

Interspeech2022 Leon Liebig, Christoph Wagner, Alexander Mainka, Peter Birkholz
An investigation of regression-based prediction of the femininity or masculinity in speech of transgender people.

Interspeech2022 João Vítor Menezes, Pouriya Amini Digehsara, Christoph Wagner, Marco Mütze, Michael Bärhold, Petr Schaffer, Dirk Plettemeier, Peter Birkholz
Evaluation of different antenna types and positions in a stepped frequency continuous-wave radar-based silent speech interface.

Interspeech2022 Debasish Ray Mohapatra, Mario Fleischer, Victor Zappi, Peter Birkholz, Sidney S. Fels, 
Three-dimensional finite-difference time-domain acoustic analysis of simplified vocal tract shapes.

Interspeech2022 Daniel R. van Niekerk, Anqi Xu, Branislav Gerazov, Paul Konstantin Krug, Peter Birkholz, Yi Xu, 
Exploration strategies for articulatory synthesis of complex syllable onsets.

Interspeech2022 Yi Xu, Anqi Xu, Daniel R. van Niekerk, Branislav Gerazov, Peter Birkholz, Paul Konstantin Krug, Santitham Prom-on, Lorna F. Halliday, 
Evoc-Learn - High quality simulation of early vocal learning.

SpeechComm2021 Peter Birkholz, Susanne Drechsel, 
Effects of the piriform fossae, transvelar acoustic coupling, and laryngeal wall vibration on the naturalness of articulatory speech synthesis.

Interspeech2021 Rémi Blandin, Marc Arnela, Simon Félix, Jean-Baptiste Doc, Peter Birkholz
Comparison of the Finite Element Method, the Multimodal Method and the Transmission-Line Model for the Computation of Vocal Tract Transfer Functions.

Interspeech2021 Alexander Wilbrandt, Simon Stone, Peter Birkholz
Articulatory Data Recorder: A Framework for Real-Time Articulatory Data Recording.

Interspeech2021 Anqi Xu, Daniel R. van Niekerk, Branislav Gerazov, Paul Konstantin Krug, Santitham Prom-on, Peter Birkholz, Yi Xu, 
Model-Based Exploration of Linking Between Vowel Articulatory Space and Acoustic Space.

SpeechComm2020 Thuan Van Ngo, Masato Akagi, Peter Birkholz
Effect of articulatory and acoustic features on the intelligibility of speech in noise: An articulatory synthesis study.

ICASSP2020 Peter Birkholz, Xinyu Zhang, 
Accounting for Microprosody in Modeling Intonation.

ICASSP2020 Simon Stone, Peter Birkholz
Cross-Speaker Silent-Speech Command Word Recognition Using Electro-Optical Stomatography.

#102  | Takuya Yoshioka | Google Scholar   DBLP
VenuesInterspeech: 20ICASSP: 13
Years2022: 102021: 92020: 62019: 22018: 22016: 4
ISCA Sectionsource separation: 3robust asr, and far-field/multi-talker asr: 2single-channel speech enhancement: 2other topics in speech recognition: 1applications in transcription, education and learning: 1multi- and cross-lingual asr, other topics in asr: 1asr neural network architectures: 1training strategies for asr: 1noise robust and distant speech recognition: 1multi-channel speech enhancement: 1rich transcription and asr systems: 1source separation from monaural input: 1distant asr: 1acoustic model adaptation: 1far-field, robustness and adaptation: 1speech enhancement and noise reduction: 1
IEEE Keywordspeech recognition: 10speaker recognition: 5speech enhancement: 4continuous speech separation: 4speech separation: 4automatic speech recognition: 3audio signal processing: 3recurrent neural nets: 3source separation: 3perceptual speech quality: 2speaker counting: 2speaker diarization: 2natural language processing: 2meeting transcription: 2transformer: 2iterative methods: 1p.835: 1personalized noise suppression: 1signal denoising: 1deep noise suppression: 1teleconferencing: 1personalized speech enhancement: 1speech intelligibility: 1speaker embedding: 1voice activity detection: 1rich transcription: 1robust automatic speech recognition: 1self supervised learning: 1recurrent selective attention network: 1signal representation: 1multi channel microphone: 1deep learning (artificial intelligence): 1conformer: 1multi speaker asr: 1probability: 1bayes methods: 1minimum bayes risk training: 1speaker identification: 1filtering theory: 1system fusion: 1libricss: 1microphones: 1overlapped speech: 1permutation invariant training: 1time domain: 1recurrent neural networks: 1frequency domain analysis: 1array signal processing: 1speaker independent speech separation: 1microphone arrays: 1convolutional neural network: 1parametric rectified linear unit: 1image classification: 1computer vision: 1noise robustness: 1
Most Publications2022: 342021: 312020: 182019: 122013: 10

Affiliations
URLs

ICASSP2022 Harishchandra Dubey, Vishak Gopal, Ross Cutler, Ashkan Aazami, Sergiy Matusevych, Sebastian Braun, Sefik Emre Eskimez, Manthan Thakker, Takuya Yoshioka, Hannes Gamper, Robert Aichner, 
Icassp 2022 Deep Noise Suppression Challenge.

ICASSP2022 Sefik Emre Eskimez, Takuya Yoshioka, Huaming Wang, Xiaofei Wang 0009, Zhuo Chen 0006, Xuedong Huang 0001, 
Personalized speech enhancement: new models and Comprehensive evaluation.

ICASSP2022 Naoyuki Kanda, Xiong Xiao, Yashesh Gaur, Xiaofei Wang 0009, Zhong Meng, Zhuo Chen 0006, Takuya Yoshioka
Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers Using End-to-End Speaker-Attributed ASR.

ICASSP2022 Heming Wang, Yao Qian, Xiaofei Wang 0009, Yiming Wang, Chengyi Wang 0002, Shujie Liu 0001, Takuya Yoshioka, Jinyu Li 0001, DeLiang Wang, 
Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction.

ICASSP2022 Yixuan Zhang, Zhuo Chen 0006, Jian Wu 0027, Takuya Yoshioka, Peidong Wang, Zhong Meng, Jinyu Li 0001, 
Continuous Speech Separation with Recurrent Selective Attention Network.

Interspeech2022 Naoyuki Kanda, Jian Wu 0027, Yu Wu 0012, Xiong Xiao, Zhong Meng, Xiaofei Wang 0009, Yashesh Gaur, Zhuo Chen 0006, Jinyu Li 0001, Takuya Yoshioka
Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings.

Interspeech2022 Naoyuki Kanda, Jian Wu 0027, Yu Wu 0012, Xiong Xiao, Zhong Meng, Xiaofei Wang 0009, Yashesh Gaur, Zhuo Chen 0006, Jinyu Li 0001, Takuya Yoshioka
Streaming Multi-Talker ASR with Token-Level Serialized Output Training.

Interspeech2022 Manthan Thakker, Sefik Emre Eskimez, Takuya Yoshioka, Huaming Wang, 
Fast Real-time Personalized Speech Enhancement: End-to-End Enhancement Network (E3Net) and Knowledge Distillation.

Interspeech2022 Xiaofei Wang 0009, Dongmei Wang, Naoyuki Kanda, Sefik Emre Eskimez, Takuya Yoshioka
Leveraging Real Conversational Data for Multi-Channel Continuous Speech Separation.

Interspeech2022 Wangyou Zhang, Zhuo Chen 0006, Naoyuki Kanda, Shujie Liu 0001, Jinyu Li 0001, Sefik Emre Eskimez, Takuya Yoshioka, Xiong Xiao, Zhong Meng, Yanmin Qian, Furu Wei, 
Separating Long-Form Speech with Group-wise Permutation Invariant Training.

ICASSP2021 Sanyuan Chen, Yu Wu 0012, Zhuo Chen 0006, Takuya Yoshioka, Shujie Liu 0001, Jin-Yu Li 0001, Xiangzhan Yu, 
Don't Shoot Butterfly with Rifles: Multi-Channel Continuous Speech Separation with Early Exit Transformer.

ICASSP2021 Sanyuan Chen, Yu Wu 0012, Zhuo Chen 0006, Jian Wu 0027, Jinyu Li 0001, Takuya Yoshioka, Chengyi Wang 0002, Shujie Liu 0001, Ming Zhou 0001, 
Continuous Speech Separation with Conformer.

ICASSP2021 Naoyuki Kanda, Zhong Meng, Liang Lu 0001, Yashesh Gaur, Xiaofei Wang 0009, Zhuo Chen 0006, Takuya Yoshioka
Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR.

ICASSP2021 Xiong Xiao, Naoyuki Kanda, Zhuo Chen 0006, Tianyan Zhou, Takuya Yoshioka, Sanyuan Chen, Yong Zhao 0008, Gang Liu, Yu Wu 0012, Jian Wu 0027, Shujie Liu 0001, Jinyu Li 0001, Yifan Gong 0001, 
Microsoft Speaker Diarization System for the Voxceleb Speaker Recognition Challenge 2020.

Interspeech2021 Sanyuan Chen, Yu Wu 0012, Zhuo Chen 0006, Jian Wu 0027, Takuya Yoshioka, Shujie Liu 0001, Jinyu Li 0001, Xiangzhan Yu, 
Ultra Fast Speech Separation Model with Teacher Student Learning.

Interspeech2021 Sefik Emre Eskimez, Xiaofei Wang 0009, Min Tang, Hemin Yang, Zirun Zhu, Zhuo Chen 0006, Huaming Wang, Takuya Yoshioka
Human Listening and Live Captioning: Multi-Task Training for Speech Enhancement.

Interspeech2021 Naoyuki Kanda, Guoli Ye, Yashesh Gaur, Xiaofei Wang 0009, Zhong Meng, Zhuo Chen 0006, Takuya Yoshioka
End-to-End Speaker-Attributed ASR with Transformer.

Interspeech2021 Naoyuki Kanda, Guoli Ye, Yu Wu 0012, Yashesh Gaur, Xiaofei Wang 0009, Zhong Meng, Zhuo Chen 0006, Takuya Yoshioka
Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone.

Interspeech2021 Jian Wu 0027, Zhuo Chen 0006, Sanyuan Chen, Yu Wu 0012, Takuya Yoshioka, Naoyuki Kanda, Shujie Liu 0001, Jinyu Li 0001, 
Investigation of Practical Aspects of Single Channel Speech Separation for ASR.

ICASSP2020 Zhuo Chen 0006, Takuya Yoshioka, Liang Lu 0001, Tianyan Zhou, Zhong Meng, Yi Luo 0004, Jian Wu 0027, Xiong Xiao, Jinyu Li 0001, 
Continuous Speech Separation: Dataset and Analysis.

#103  | Zhong Meng | Google Scholar   DBLP
VenuesInterspeech: 21ICASSP: 12
Years2022: 62021: 92020: 62019: 52018: 32017: 22016: 2
ISCA Sectionmulti- and cross-lingual asr, other topics in asr: 3other topics in speech recognition: 1robust asr, and far-field/multi-talker asr: 1novel models and training methods for asr: 1source separation: 1self-supervision and semi-supervision for neural asr training: 1applications in transcription, education and learning: 1neural network training methods for asr: 1asr neural network architectures: 1training strategies for asr: 1asr neural network architectures and training: 1spoken term detection, confidence measure, and end-to-end speech recognition: 1asr neural network training: 1novel approaches to enhancement: 1deep enhancement: 1topic spotting, entity extraction and semantic analysis: 1discriminative training for asr: 1spoken term detection: 1robust speaker recognition and anti-spoofing: 1
IEEE Keywordspeech recognition: 11speaker recognition: 4automatic speech recognition: 4probability: 3natural language processing: 3deep neural network: 3adversarial learning: 3speaker counting: 2audio signal processing: 2continuous speech separation: 2recurrent neural nets: 2teacher student learning: 2neural network: 2speaker diarization: 1voice activity detection: 1rich transcription: 1meeting transcription: 1source separation: 1recurrent selective attention network: 1bayes methods: 1minimum bayes risk training: 1speech separation: 1speaker identification: 1recurrent neural network transducer: 1attention based encoder decoder: 1language model: 1regularization: 1sequence training: 1self teaching: 1libricss: 1microphones: 1overlapped speech: 1permutation invariant training: 1lstm: 1latency: 1computer aided instruction: 1entropy: 1domain adaptation: 1backpropagation: 1knowledge representation: 1label embedding: 1speaker adaptation: 1domain invariant training: 1attention: 1speaker verification: 1speaker invariant training: 1adversariallearning: 1deep neural networks: 1
Most Publications2021: 272020: 212019: 142022: 112018: 9

Affiliations
URLs

ICASSP2022 Naoyuki Kanda, Xiong Xiao, Yashesh Gaur, Xiaofei Wang 0009, Zhong Meng, Zhuo Chen 0006, Takuya Yoshioka, 
Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers Using End-to-End Speaker-Attributed ASR.

ICASSP2022 Yixuan Zhang, Zhuo Chen 0006, Jian Wu 0027, Takuya Yoshioka, Peidong Wang, Zhong Meng, Jinyu Li 0001, 
Continuous Speech Separation with Recurrent Selective Attention Network.

Interspeech2022 Naoyuki Kanda, Jian Wu 0027, Yu Wu 0012, Xiong Xiao, Zhong Meng, Xiaofei Wang 0009, Yashesh Gaur, Zhuo Chen 0006, Jinyu Li 0001, Takuya Yoshioka, 
Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings.

Interspeech2022 Naoyuki Kanda, Jian Wu 0027, Yu Wu 0012, Xiong Xiao, Zhong Meng, Xiaofei Wang 0009, Yashesh Gaur, Zhuo Chen 0006, Jinyu Li 0001, Takuya Yoshioka, 
Streaming Multi-Talker ASR with Token-Level Serialized Output Training.

Interspeech2022 Zhong Meng, Yashesh Gaur, Naoyuki Kanda, Jinyu Li 0001, Xie Chen 0001, Yu Wu 0012, Yifan Gong 0001, 
Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition.

Interspeech2022 Wangyou Zhang, Zhuo Chen 0006, Naoyuki Kanda, Shujie Liu 0001, Jinyu Li 0001, Sefik Emre Eskimez, Takuya Yoshioka, Xiong Xiao, Zhong Meng, Yanmin Qian, Furu Wei, 
Separating Long-Form Speech with Group-wise Permutation Invariant Training.

ICASSP2021 Naoyuki Kanda, Zhong Meng, Liang Lu 0001, Yashesh Gaur, Xiaofei Wang 0009, Zhuo Chen 0006, Takuya Yoshioka, 
Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR.

ICASSP2021 Zhong Meng, Naoyuki Kanda, Yashesh Gaur, Sarangarajan Parthasarathy, Eric Sun, Liang Lu 0001, Xie Chen 0001, Jinyu Li 0001, Yifan Gong 0001, 
Internal Language Model Training for Domain-Adaptive End-To-End Speech Recognition.

ICASSP2021 Eric Sun, Liang Lu 0001, Zhong Meng, Yifan Gong 0001, 
Sequence-Level Self-Teaching Regularization.

Interspeech2021 Liang Lu 0001, Zhong Meng, Naoyuki Kanda, Jinyu Li 0001, Yifan Gong 0001, 
On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer.

Interspeech2021 Yan Deng, Rui Zhao 0017, Zhong Meng, Xie Chen 0001, Bing Liu, Jinyu Li 0001, Yifan Gong 0001, Lei He 0005, 
Improving RNN-T for Domain Scaling Using Semi-Supervised Training with Neural TTS.

Interspeech2021 Naoyuki Kanda, Guoli Ye, Yashesh Gaur, Xiaofei Wang 0009, Zhong Meng, Zhuo Chen 0006, Takuya Yoshioka, 
End-to-End Speaker-Attributed ASR with Transformer.

Interspeech2021 Naoyuki Kanda, Guoli Ye, Yu Wu 0012, Yashesh Gaur, Xiaofei Wang 0009, Zhong Meng, Zhuo Chen 0006, Takuya Yoshioka, 
Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone.

Interspeech2021 Zhong Meng, Yu Wu 0012, Naoyuki Kanda, Liang Lu 0001, Xie Chen 0001, Guoli Ye, Eric Sun, Jinyu Li 0001, Yifan Gong 0001, 
Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition.

Interspeech2021 Eric Sun, Jinyu Li 0001, Zhong Meng, Yu Wu 0012, Jian Xue, Shujie Liu 0001, Yifan Gong 0001, 
Improving Multilingual Transformer Transducer Models by Reducing Language Confusions.

ICASSP2020 Zhuo Chen 0006, Takuya Yoshioka, Liang Lu 0001, Tianyan Zhou, Zhong Meng, Yi Luo 0004, Jian Wu 0027, Xiong Xiao, Jinyu Li 0001, 
Continuous Speech Separation: Dataset and Analysis.

ICASSP2020 Jinyu Li 0001, Rui Zhao 0017, Eric Sun, Jeremy H. M. Wong, Amit Das, Zhong Meng, Yifan Gong 0001, 
High-Accuracy and Low-Latency Speech Recognition with Two-Head Contextual Layer Trajectory LSTM Model.

ICASSP2020 Zhong Meng, Hu Hu, Jinyu Li 0001, Changliang Liu, Yan Huang 0028, Yifan Gong 0001, Chin-Hui Lee, 
L-Vector: Neural Label Embedding for Domain Adaptation.

Interspeech2020 Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang 0009, Zhong Meng, Zhuo Chen 0006, Tianyan Zhou, Takuya Yoshioka, 
Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of any Number of Speakers.

Interspeech2020 Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang 0009, Zhong Meng, Takuya Yoshioka, 
Serialized Output Training for End-to-End Overlapped Speech Recognition.

#104  | Odette Scharenborg | Google Scholar   DBLP
VenuesInterspeech: 21ICASSP: 5SpeechComm: 4TASLP: 2
Years2023: 12022: 72021: 52020: 62019: 42018: 52016: 4
ISCA Sectionspeech and voice disorders: 2applications of asr: 2spoken word recognition: 2non-native speech perception: 2spoken language processing: 1low-resource asr development: 1technology for disordered speech: 1multi-, cross-lingual and other topics in asr: 1spoken dialogue systems and multimodality: 1low-resource speech recognition: 1topics in asr: 1phonetic event detection and segmentation: 1neural networks for language modeling: 1speech in the brain: 1signal analysis for the natural, biological and social sciences: 1speech perception in adverse conditions: 1deep neural networks: 1
IEEE Keywordnatural language processing: 5automatic speech recognition: 3text analysis: 3speech synthesis: 3speech recognition: 3decoding: 2image retrieval: 2misp challenge: 1audio visual systems: 1wake word spotting: 1public domain software: 1audio visual: 1speaker recognition: 1microphone array: 1supervised learning: 1adversarial learning: 1speech embedding: 1speech to image generation: 1multimodal modelling: 1image recognition: 1multilingual: 1phonotactics: 1zero shot learning: 1encoder decoder: 1sequence to sequence: 1image captioning: 1image to speech: 1human computer interaction: 1image representation: 1unsupervised learning: 1low resource asr: 1bayes methods: 1acoustic unit discovery: 1bayesian model: 1informative prior: 1multi modal data: 1unwritten languages: 1unsupervised unit discovery: 1machine translation: 1linguistics: 1
Most Publications2021: 182020: 172022: 152018: 102019: 8

Affiliations
URLs

SpeechComm2023 Bence Mark Halpern, Siyuan Feng 0001, Rob van Son, Michiel W. M. van den Brekel, Odette Scharenborg
Automatic evaluation of spontaneous oral cancer speech using ratings from naive listeners.

SpeechComm2022 Bence Mark Halpern, Siyuan Feng 0001, Rob van Son, Michiel W. M. van den Brekel, Odette Scharenborg
Low-resource automatic speech recognition and error analyses of oral cancer speech.

ICASSP2022 Hang Chen, Hengshun Zhou, Jun Du, Chin-Hui Lee, Jingdong Chen, Shinji Watanabe 0001, Sabato Marco Siniscalchi, Odette Scharenborg, Diyuan Liu, Bao-Cai Yin, Jia Pan, Jianqing Gao, Cong Liu 0006, 
The First Multimodal Information Based Speech Processing (Misp) Challenge: Data, Tasks, Baselines And Results.

Interspeech2022 Hang Chen, Jun Du, Yusheng Dai, Chin-Hui Lee, Sabato Marco Siniscalchi, Shinji Watanabe 0001, Odette Scharenborg, Jingdong Chen, Baocai Yin, Jia Pan, 
Audio-Visual Speech Recognition in MISP2021 Challenge: Dataset Release and Deep Analysis.

Interspeech2022 Tanvina Patel, Odette Scharenborg
Using cross-model learnings for the Gram Vaani ASR Challenge 2022.

Interspeech2022 Luke Prananta, Bence Mark Halpern, Siyuan Feng 0001, Odette Scharenborg
The Effectiveness of Time Stretching for Enhancing Dysarthric Speech for Improved Dysarthric Speech Recognition.

Interspeech2022 Yuanyuan Zhang, Yixuan Zhang, Bence Mark Halpern, Tanvina Patel, Odette Scharenborg
Mitigating bias against non-native accents.

Interspeech2022 Hengshun Zhou, Jun Du, Gongzhen Zou, Zhaoxu Nian, Chin-Hui Lee, Sabato Marco Siniscalchi, Shinji Watanabe 0001, Odette Scharenborg, Jingdong Chen, Shifu Xiong, Jianqing Gao, 
Audio-Visual Wake Word Spotting in MISP2021 Challenge: Dataset Release and Deep Analysis.

SpeechComm2021 Polina Drozdova, Roeland van Hout, Sven L. Mattys, Odette Scharenborg
The effect of intermittent noise on lexically-guided perceptual learning in native and non-native listening.

TASLP2021 Xinsheng Wang, Tingting Qiao, Jihua Zhu, Alan Hanjalic, Odette Scharenborg
Generating Images From Spoken Descriptions.

ICASSP2021 Siyuan Feng 0001, Piotr Zelasko, Laureano Moro-Velázquez, Ali Abavisani, Mark Hasegawa-Johnson, Odette Scharenborg, Najim Dehak, 
How Phonotactics Affect Multilingual and Zero-Shot ASR Performance.

ICASSP2021 Xinsheng Wang, Siyuan Feng 0001, Jihua Zhu, Mark Hasegawa-Johnson, Odette Scharenborg
Show and Speak: Directly Synthesize Spoken Description of Images.

Interspeech2021 Siyuan Feng 0001, Piotr Zelasko, Laureano Moro-Velázquez, Odette Scharenborg
Unsupervised Acoustic Unit Discovery by Leveraging a Language-Independent Subword Discriminative Feature Representation.

TASLP2020 Odette Scharenborg, Lucas Ondel, Shruti Palaskar, Philip Arthur, Francesco Ciannella, Mingxing Du, Elin Larsen, Danny Merkx, Rachid Riad, Liming Wang, Emmanuel Dupoux, Laurent Besacier, Alan W. Black, Mark Hasegawa-Johnson, Florian Metze, Graham Neubig, Sebastian Stüker, Pierre Godard, Markus Müller 0001, 
Speech Technology for Unwritten Languages.

Interspeech2020 Siyuan Feng 0001, Odette Scharenborg
Unsupervised Subword Modeling Using Autoregressive Pretraining and Cross-Lingual Phone-Aware Modeling.

Interspeech2020 Bence Mark Halpern, Rob van Son, Michiel W. M. van den Brekel, Odette Scharenborg
Detecting and Analysing Spontaneous Oral Cancer Speech in the Wild.

Interspeech2020 Justin van der Hout, Zoltán D'Haese, Mark Hasegawa-Johnson, Odette Scharenborg
Evaluating Automatically Generated Phoneme Captions for Images.

Interspeech2020 Xinsheng Wang, Tingting Qiao, Jihua Zhu, Alan Hanjalic, Odette Scharenborg
S2IGAN: Speech-to-Image Generation via Adversarial Learning.

Interspeech2020 Piotr Zelasko, Laureano Moro-Velázquez, Mark Hasegawa-Johnson, Odette Scharenborg, Najim Dehak, 
That Sounds Familiar: An Analysis of Phonetic Representations Transfer Across Languages.

SpeechComm2019 Odette Scharenborg, Marjolein van Os, 
Why listening in background noise is harder in a non-native language than in a native language: A review.

#105  | Tanja Schultz | Google Scholar   DBLP
VenuesInterspeech: 23ICASSP: 8SpeechComm: 1
Years2022: 102021: 52020: 102019: 22018: 32017: 12016: 1
ISCA Sectionspeech and language in health: 2source separation: 2health and affect: 2cross/multi-lingual and code-switched speech recognition: 2novel paradigms for direct synthesis based on speech-related biosignals: 2speech synthesis: 1target speaker detection, localization and separation: 1new trends in self-supervised speech processing: 1neural signals for spoken communication: 1speech in multimodality: 1computational paralinguistics: 1human speech production: 1accoustic phonetics of l1-l2 and other interactions: 1applications of language technologies: 1keynote: 1speech and language analytics for mental health: 1disorders related to speech and language: 1special session: 1
IEEE Keywordnatural language processing: 6speech recognition: 6ethiopian languages: 3gaussian processes: 2language translation: 2hearing: 2globalphone: 2vocabulary: 2deep neural networks: 2acoustic and linguistic features: 1adress challenge: 1diseases: 1handicapped aids: 1ilse corpus: 1speech & language: 1support vector machines: 1alzheimer’s disease: 1electroencephalography: 1medical signal processing: 1speech intelligibility: 1biomedical electrodes: 1prosthetics: 1speech: 1medical disorders: 1speech synthesis: 1neuroprosthesis: 1low latency processing of neural signals: 1stereo tactic eeg: 1neurophysiology: 1multilingual: 1selective auditory attention: 1cocktail party problem: 1grammars: 1target language extraction: 1natural languages: 1sub word segmentation: 1malayalam: 1language modelling: 1image segmentation: 1oov: 1code switching: 1low resource languages: 1healthcare: 1intelligent medicine: 1medical computing: 1computer vision: 1audio signal processing: 1computer audition: 1health care: 1digital phenotype: 1overview: 1modeling units: 1end to end asr: 1decoding: 1out of vocabulary: 1automatic speech recognition: 1dnn: 1hidden markov models: 1linguistics: 1
Most Publications2014: 362013: 242009: 242012: 232022: 22

Affiliations
University of Bremen, Cognitive Systems Lab, Germany
Carnegie Mellon University, Pittsburgh, USA (former)

SpeechComm2022 Martha Yifiru Tachbelie, Solomon Teferra Abate, Tanja Schultz
Multilingual speech recognition for GlobalPhone languages.

ICASSP2022 Ayimnisagul Ablimit, Catarina Botelho, Alberto Abad, Tanja Schultz, Isabel Trancoso, 
Exploring Dementia Detection from Speech: Cross Corpus Analysis.

ICASSP2022 Miguel Angrick, Maarten C. Ottenhoff, Lorenz Diener, Darius Ivucic, Gabriel Ivucic, Sophocles Goulis, Albert J. Colon, G. Louis Wagner, Dean J. Krusienski, Pieter L. Kubben, Tanja Schultz, Christian Herff, 
Towards Closed-Loop Speech Synthesis from Stereotactic EEG: A Unit Selection Approach.

ICASSP2022 Marvin Borsdorf, Kevin Scheck, Haizhou Li 0001, Tanja Schultz
Experts Versus All-Rounders: Target Language Extraction for Multiple Target Languages.

ICASSP2022 Sreeja Manghat, Sreeram Manghat, Tanja Schultz
Hybrid sub-word segmentation for handling long tail in morphologically rich low resource languages.

ICASSP2022 Kun Qian 0003, Tanja Schultz, Björn W. Schuller, 
An Overview of the FIRST ICASSP Special Session on Computer Audition for Healthcare.

Interspeech2022 Ayimnisagul Ablimit, Karen Scholz, Tanja Schultz
Deep Learning Approaches for Detecting Alzheimer's Dementia from Conversational Speech of ILSE Study.

Interspeech2022 Marvin Borsdorf, Kevin Scheck, Haizhou Li 0001, Tanja Schultz
Blind Language Separation: Disentangling Multilingual Cocktail Party Voices by Language.

Interspeech2022 Catarina Botelho, Tanja Schultz, Alberto Abad, Isabel Trancoso, 
Challenges of using longitudinal and cross-domain corpora on studies of pathological speech.

Interspeech2022 Sreeram Manghat, Sreeja Manghat, Tanja Schultz
Normalization of code-switched text for speech synthesis.

ICASSP2021 Solomon Teferra Abate, Martha Yifiru Tachbelie, Tanja Schultz
End-to-End Multilingual Automatic Speech Recognition for Less-Resourced Languages: The Case of Four Ethiopian Languages.

Interspeech2021 Marvin Borsdorf, Chenglin Xu, Haizhou Li 0001, Tanja Schultz
Universal Speaker Extraction in the Presence and Absence of Target Speakers for Speech of One and Two Talkers.

Interspeech2021 Marvin Borsdorf, Chenglin Xu, Haizhou Li 0001, Tanja Schultz
GlobalPhone Mix-To-Separate Out of 2: A Multilingual 2000 Speakers Mixtures Database for Speech Separation.

Interspeech2021 Catarina Botelho, Alberto Abad, Tanja Schultz, Isabel Trancoso, 
Visual Speech for Obstructive Sleep Apnea Detection.

Interspeech2021 Lars Steinert, Felix Putze, Dennis Küster, Tanja Schultz
Audio-Visual Recognition of Emotional Engagement of People with Dementia.

ICASSP2020 Solomon Teferra Abate, Martha Yifiru Tachbelie, Tanja Schultz
Deep Neural Networks Based Automatic Speech Recognition for Four Ethiopian Languages.

ICASSP2020 Martha Yifiru Tachbelie, Ayimunishagu Abulimiti, Solomon Teferra Abate, Tanja Schultz
DNN-Based Speech Recognition for Globalphone Languages.

Interspeech2020 Solomon Teferra Abate, Martha Yifiru Tachbelie, Tanja Schultz
Multilingual Acoustic and Language Modeling for Ethio-Semitic Languages.

Interspeech2020 Ayimunishagu Abulimiti, Jochen Weiner, Tanja Schultz
Automatic Speech Recognition for ILSE-Interviews: Longitudinal Conversational Speech Recordings Covering Aging and Cognitive Decline.

Interspeech2020 Miguel Angrick, Christian Herff, Garett D. Johnson, Jerry J. Shih, Dean J. Krusienski, Tanja Schultz
Speech Spectrogram Estimation from Intracranial Brain Activity Using a Quantization Approach.

#106  | Petr Motlícek | Google Scholar   DBLP
VenuesInterspeech: 21ICASSP: 9SpeechComm: 2
Years2022: 12021: 92020: 42019: 32018: 52017: 42016: 6
ISCA Sectionautomatic speech recognition in air traffic management: 3show and tell: 1embedding and network architecture for speaker recognition: 1openasr20 and low resource asr development: 1voice activity detection: 1assessment of pathological speech and language: 1multilingual and code-switched asr: 1learning techniques for speaker recognition: 1applications of asr: 1model adaptation for asr: 1cross-lingual and multilingual asr: 1speaker verification using neural network methods: 1deep learning for source separation and pitch tracking: 1speaker verification: 1multimodal systems: 1short utterances speaker recognition: 1acoustic model adaptation: 1speaker recognition: 1speech synthesis: 1
IEEE Keywordspeaker recognition: 5speech recognition: 3probability: 3finite state transducers: 2natural language processing: 2speaker verification: 2i vectors: 2data handling: 1software reliability: 1human computer interaction: 1callsign detection: 1aerospace computing: 1automatic speech recognition: 1air traffic: 1air surveillance data: 1video surveillance: 1aircraft communication: 1air traffic control: 1risk analysis: 1oov word recognition: 1speech dataset: 1multi genre speech recognition: 1semi supervised learning: 1incremental training: 1sensor fusion: 1bayesian fusion: 1inter task fusion: 1speaker embedding: 1content mismatch: 1plda: 1multi session training: 1covariance matrices: 1gaussian processes: 1text dependent speaker verification: 1dynamic time warping: 1text analysis: 1hidden markov models: 1mixture models: 1dnn posterior: 1domain adaptation: 1information theoretic measures: 1plda model: 1unsupervised learning: 1fusion: 1ward: 1speaker diarization: 1i vector: 1longitudinal: 1clustering: 1audio recording: 1linking: 1television broadcasting: 1
Most Publications2022: 222021: 192020: 172019: 132012: 11

Affiliations
URLs

ICASSP2022 Iuliia Nigmatulina, Juan Zuluaga-Gomez, Amrutha Prasad, Seyyed Saeed Sarfjoo, Petr Motlícek
A Two-Step Approach to Leverage Contextual Data: Speech Recognition in Air-Traffic Communications.

ICASSP2021 Rudolf A. Braun, Srikanth R. Madikeri, Petr Motlícek
A Comparison of Methods for OOV-Word Recognition on a New Public Dataset.

Interspeech2021 Maël Fabien, Shantipriya Parida, Petr Motlícek, Dawei Zhu, Aravind Krishnan, Hoang H. Nguyen, 
ROXANNE Research Platform: Automate Criminal Investigations.

Interspeech2021 Weipeng He, Petr Motlícek, Jean-Marc Odobez, 
Multi-Task Neural Network for Robust Multiple Speaker Embedding Extraction.

Interspeech2021 Martin Kocour, Karel Veselý, Alexander Blatt, Juan Zuluaga-Gomez, Igor Szöke, Jan Cernocký, Dietrich Klakow, Petr Motlícek
Boosting of Contextual Information in ASR for Air-Traffic Call-Sign Recognition.

Interspeech2021 Srikanth R. Madikeri, Petr Motlícek, Hervé Bourlard, 
Multitask Adaptation with Lattice-Free MMI for Multi-Genre Speech Recognition of Low Resource Languages.

Interspeech2021 Oliver Ohneiser, Seyyed Saeed Sarfjoo, Hartmut Helmke, Shruthi Shetty, Petr Motlícek, Matthias Kleinert, Heiko Ehr, Sarunas Murauskas, 
Robust Command Recognition for Lithuanian Air Traffic Control Tower Utterances.

Interspeech2021 Seyyed Saeed Sarfjoo, Srikanth R. Madikeri, Petr Motlícek
Speech Activity Detection Based on Multilingual Speech Recognition System.

Interspeech2021 Esaú Villatoro-Tello, S. Pavankumar Dubagunta, Julian Fritsch, Gabriela Ramírez-de-la-Rosa, Petr Motlícek, Mathew Magimai-Doss, 
Late Fusion of the Available Lexicon and Raw Waveform-Based Acoustic Modeling for Depression and Dementia Recognition.

Interspeech2021 Juan Zuluaga-Gomez, Iuliia Nigmatulina, Amrutha Prasad, Petr Motlícek, Karel Veselý, Martin Kocour, Igor Szöke, 
Contextual Semi-Supervised Learning: An Approach to Leverage Air-Surveillance and Untranscribed ATC Data in ASR Systems.

ICASSP2020 Banriskhem K. Khonglah, Srikanth R. Madikeri, Subhadeep Dey, Hervé Bourlard, Petr Motlícek, Jayadev Billa, 
Incremental Semi-Supervised Learning for Multi-Genre Speech Recognition.

Interspeech2020 Srikanth R. Madikeri, Banriskhem K. Khonglah, Sibo Tong, Petr Motlícek, Hervé Bourlard, Daniel Povey, 
Lattice-Free Maximum Mutual Information Training of Multilingual Speech Recognition Systems.

Interspeech2020 Seyyed Saeed Sarfjoo, Srikanth R. Madikeri, Petr Motlícek, Sébastien Marcel, 
Supervised Domain Adaptation for Text-Independent Speaker Verification Using Limited Data.

Interspeech2020 Juan Zuluaga-Gomez, Petr Motlícek, Qingran Zhan, Karel Veselý, Rudolf A. Braun, 
Automatic Speech Recognition Benchmark for Air-Traffic Communications.

ICASSP2019 Srikanth R. Madikeri, Petr Motlícek, Subhadeep Dey, 
A Bayesian Approach to Inter-task Fusion for Speaker Recognition.

Interspeech2019 Subhadeep Dey, Petr Motlícek, Trung Bui, Franck Dernoncourt, 
Exploiting Semi-Supervised Training Through a Dropout Regularization in End-to-End Speech Recognition.

Interspeech2019 Thibault Viglino, Petr Motlícek, Milos Cernak, 
End-to-End Accented Speech Recognition.

ICASSP2018 Subhadeep Dey, Takafumi Koshinaka, Petr Motlícek, Srikanth R. Madikeri, 
DNN Based Speaker Embedding Using Content Information for Text-Dependent Speaker Verification.

Interspeech2018 Subhadeep Dey, Srikanth R. Madikeri, Petr Motlícek
End-to-end Text-dependent Speaker Verification Using Novel Distance Measures.

Interspeech2018 Weipeng He, Petr Motlícek, Jean-Marc Odobez, 
Joint Localization and Classification of Multiple Sound Sources Using a Multi-task Neural Network.

#107  | Wei-Ning Hsu | Google Scholar   DBLP
VenuesInterspeech: 16ACL: 4ICASSP: 4ICLR: 3NeurIPS: 2ICML: 1NAACL: 1TASLP: 1
Years2022: 122021: 62020: 32019: 52018: 42017: 12016: 1
ISCA Sectionspeech synthesis: 2spoken language processing: 1zero, low-resource and multi-modal speech recognition: 1speaker recognition and anti-spoofing: 1resource-constrained asr: 1self-supervision and semi-supervision for neural asr training: 1speech signal representation: 1new trends in self-supervised speech processing: 1speech signal characterization: 1speech recognition and beyond: 1deep neural networks: 1robust speech recognition: 1neural network training strategies for asr: 1voice conversion: 1new trends in neural networks for speech recognition: 1
IEEE Keywordunsupervised learning: 3speech recognition: 3pre training: 2speech synthesis: 2pattern clustering: 1bert: 1natural language processing: 1self supervised learning: 1supervised learning: 1representation learning: 1tacotron: 1data efficiency: 1semi supervised learning: 1text to speech: 1text analysis: 1variational autoencoder: 1speaker recognition: 1adversarial training: 1text to speech synthesis: 1data augmentation: 1speech coding: 1domain invariant representations: 1robust speech recognition: 1factorized hierarchical variational autoencoder: 1
Most Publications2022: 342021: 212018: 142023: 102019: 9

Affiliations
URLs

Interspeech2022 Alexander H. Liu, Cheng-I Lai, Wei-Ning Hsu, Michael Auli, Alexei Baevski, James R. Glass, 
Simple and Effective Unsupervised Speech Synthesis.

Interspeech2022 Sravya Popuri, Peng-Jen Chen, Changhan Wang, Juan Pino 0001, Yossi Adi, Jiatao Gu, Wei-Ning Hsu, Ann Lee 0001, 
Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation.

Interspeech2022 Bowen Shi, Wei-Ning Hsu, Abdelrahman Mohamed, 
Robust Self-Supervised Audio-Visual Speech Recognition.

Interspeech2022 Bowen Shi, Abdelrahman Mohamed, Wei-Ning Hsu
Learning Lip-Based Audio-Visual Speaker Embeddings with AV-HuBERT.

Interspeech2022 Apoorv Vyas, Wei-Ning Hsu, Michael Auli, Alexei Baevski, 
On-demand compute reduction with stochastic wav2vec 2.0.

ICML2022 Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli, 
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language.

NeurIPS2022 Wei-Ning Hsu, Bowen Shi, 
u-HuBERT: Unified Mixed-Modal Speech Pretraining And Zero-Shot Transfer to Unlabeled Modality.

ICLR2022 Bowen Shi, Wei-Ning Hsu, Kushal Lakhotia, Abdelrahman Mohamed, 
Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction.

ACL2022 Eugene Kharitonov, Ann Lee 0001, Adam Polyak, Yossi Adi, Jade Copet, Kushal Lakhotia, Tu Anh Nguyen, Morgane Rivière, Abdelrahman Mohamed, Emmanuel Dupoux, Wei-Ning Hsu
Text-Free Prosody-Aware Generative Spoken Language Modeling.

ACL2022 Ann Lee 0001, Peng-Jen Chen, Changhan Wang, Jiatao Gu, Sravya Popuri, Xutai Ma, Adam Polyak, Yossi Adi, Qing He, Yun Tang 0002, Juan Pino 0001, Wei-Ning Hsu
Direct Speech-to-Speech Translation With Discrete Units.

ACL2022 Yun Tang 0002, Hongyu Gong, Ning Dong, Changhan Wang, Wei-Ning Hsu, Jiatao Gu, Alexei Baevski, Xian Li, Abdelrahman Mohamed, Michael Auli, Juan Miguel Pino, 
Unified Speech-Text Pre-training for Speech Translation and Recognition.

NAACL2022 Ann Lee 0001, Hongyu Gong, Paul-Ambroise Duquenne, Holger Schwenk, Peng-Jen Chen, Changhan Wang, Sravya Popuri, Yossi Adi, Juan Miguel Pino, Jiatao Gu, Wei-Ning Hsu
Textless Speech-to-Speech Translation on Real Data.

TASLP2021 Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed, 
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units.

ICASSP2021 Wei-Ning Hsu, Yao-Hung Hubert Tsai, Benjamin Bolte, Ruslan Salakhutdinov, Abdelrahman Mohamed, 
Hubert: How Much Can a Bad Teacher Benefit ASR Pre-Training?

Interspeech2021 Wei-Ning Hsu, Anuroop Sriram, Alexei Baevski, Tatiana Likhomanenko, Qiantong Xu, Vineel Pratap, Jacob Kahn, Ann Lee 0001, Ronan Collobert, Gabriel Synnaeve, Michael Auli, 
Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training.

Interspeech2021 Adam Polyak, Yossi Adi, Jade Copet, Eugene Kharitonov, Kushal Lakhotia, Wei-Ning Hsu, Abdelrahman Mohamed, Emmanuel Dupoux, 
Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.

NeurIPS2021 Alexei Baevski, Wei-Ning Hsu, Alexis Conneau, Michael Auli, 
Unsupervised Speech Recognition.

ACL2021 Wei-Ning Hsu, David Harwath, Tyler Miller, Christopher Song, James R. Glass, 
Text-Free Image-to-Speech Synthesis Using Learned Segmental Units.

Interspeech2020 Michael Gump, Wei-Ning Hsu, James R. Glass, 
Unsupervised Methods for Evaluating Speech Representations.

Interspeech2020 Sameer Khurana, Antoine Laurent, Wei-Ning Hsu, Jan Chorowski, Adrian Lancucki, Ricard Marxer, James R. Glass, 
A Convolutional Deep Markov Model for Unsupervised Speech Representation Learning.

#108  | Ming Li 0026 | Google Scholar   DBLP
VenuesInterspeech: 22ICASSP: 9TASLP: 1
Years2022: 72021: 62020: 72019: 82018: 22017: 2
ISCA Sectionspeech synthesis: 3speaker embedding and diarization: 2special session: 2spoofing-aware automatic speaker verification (sasv): 1spoken term detection & voice search: 1sdsv challenge 2021: 1robust speaker recognition: 1feature, embedding and neural architecture for speaker recognition: 1targeted source separation: 1speaker diarization: 1the fearless steps challenge phase-02: 1the interspeech 2020 far field speaker verification challenge: 1the voices from a distance challenge: 1speaker recognition: 1the 2019 automatic speaker verification spoofing and countermeasures challenge: 1speaker recognition and diarization: 1speaker and language recognition: 1speaker verification using neural network methods: 1
IEEE Keywordspeaker recognition: 8target speaker voice activity detection: 3voice activity detection: 3natural language processing: 3speaker verification: 2deep learning (artificial intelligence): 2speaker embedding: 2end to end: 2utterance level: 2pattern clustering: 1speaker diarization: 1lightweight speaker verification: 1asymmetric enroll verify structure: 1ecapa tdnnlite: 1end to end speaker diarization: 1multichannel speaker diarization: 1iterative methods: 1clustering: 1contrastive learning: 1self supervised learning: 1neural network: 1robustness: 1noisy conditions: 1far field: 1open source database: 1microphone arrays: 1multichannel: 1text dependent: 1convolutional neural nets: 1language identification: 1cnn blstm: 1attention: 1speech recognition: 1probability: 1fundamental frequency: 1electrolaryngeal speech: 1speech intelligibility: 1speech enhancement: 1voice conversion: 1phonetic feature: 1speech coding: 1gaussian processes: 1language identification (lid): 1variable length: 1encoding: 1mixture models: 1encoding layer: 1
Most Publications2022: 292020: 252021: 242019: 192018: 14

Affiliations
Duke Kunshan University, Data Science Research Center, China
Sun Yat-Sen University Carnegie Mellon University Joint Institute of Engineering, China (former)
University of Southern California, Los Angeles, CA, USA (former)
Chinese Academy of Sciences, Institute of Acoustics, China (former)

TASLP2022 Weiqing Wang, Qingjian Lin, Danwei Cai, Ming Li 0026
Similarity Measurement of Segment-Level Speaker Embeddings in Speaker Diarization.

ICASSP2022 Qingjian Li, Lin Yang, Xuyang Wang, Xiaoyi Qin, Junjie Wang, Ming Li 0026
Towards Lightweight Applications: Asymmetric Enroll-Verify Structure for Speaker Verification.

ICASSP2022 Weiqing Wang, Ming Li 0026
Incorporating End-to-End Framework Into Target-Speaker Voice Activity Detection.

ICASSP2022 Weiqing Wang, Xiaoyi Qin, Ming Li 0026
Cross-Channel Attention-Based Target Speaker Voice Activity Detection: Experimental Results for the M2met Challenge.

Interspeech2022 Xiaoyi Qin, Na Li 0012, Chao Weng, Dan Su 0002, Ming Li 0026
Cross-Age Speaker Verification: Learning Age-Invariant Speaker Embeddings.

Interspeech2022 Weiqing Wang, Ming Li 0026, Qingjian Lin, 
Online Target Speaker Voice Activity Detection for Speaker Diarization.

Interspeech2022 Xingming Wang, Xiaoyi Qin, Yikang Wang, Yunfei Xu, Ming Li 0026
The DKU-OPPO System for the 2022 Spoofing-Aware Speaker Verification Challenge.

ICASSP2021 Danwei Cai, Weiqing Wang, Ming Li 0026
An Iterative Framework for Self-Supervised Deep Speaker Representation Learning.

Interspeech2021 Yan Jia, Xingming Wang, Xiaoyi Qin, Yinping Zhang, Xuyang Wang, Junjie Wang, Dong Zhang, Ming Li 0026
The 2020 Personalized Voice Trigger Challenge: Open Datasets, Evaluation Metrics, Baseline System and Results.

Interspeech2021 Xiaoyi Qin, Chao Wang, Yong Ma, Min Liu, Shilei Zhang, Ming Li 0026
Our Learned Lessons from Cross-Lingual Speaker Verification: The CRMI-DKU System Description for the Short-Duration Speaker Verification Challenge 2021.

Interspeech2021 Yao Shi, Hui Bu, Xin Xu, Shaoji Zhang, Ming Li 0026
AISHELL-3: A Multi-Speaker Mandarin TTS Corpus.

Interspeech2021 Weiqing Wang, Danwei Cai, Jin Wang, Qingjian Lin, Xuyang Wang, Mi Hong, Ming Li 0026
The DKU-Duke-Lenovo System Description for the Fearless Steps Challenge Phase III.

Interspeech2021 Tinglong Zhu, Xiaoyi Qin, Ming Li 0026
Binary Neural Network for Speaker Verification.

ICASSP2020 Danwei Cai, Weicheng Cai, Ming Li 0026
Within-Sample Variability-Invariant Loss for Robust Speaker Recognition Under Noisy Environments.

ICASSP2020 Xiaoyi Qin, Hui Bu, Ming Li 0026
HI-MIA: A Far-Field Text-Dependent Speaker Verification Database and the Baselines.

Interspeech2020 Zexin Cai, Chuxiong Zhang, Ming Li 0026
From Speaker Verification to Multispeaker Speech Synthesis, Deep Transfer with Feedback Constraint.

Interspeech2020 Tingle Li, Qingjian Lin, Yuanyuan Bao, Ming Li 0026
Atss-Net: Target Speaker Separation via Attention-Based Neural Network.

Interspeech2020 Qingjian Lin, Yu Hou, Ming Li 0026
Self-Attentive Similarity Measurement Strategies in Speaker Diarization.

Interspeech2020 Qingjian Lin, Tingle Li, Ming Li 0026
The DKU Speech Activity Detection and Speaker Identification Systems for Fearless Steps Challenge Phase-02.

Interspeech2020 Xiaoyi Qin, Ming Li 0026, Hui Bu, Wei Rao, Rohan Kumar Das, Shrikanth Narayanan, Haizhou Li 0001, 
The INTERSPEECH 2020 Far-Field Speaker Verification Challenge.

#109  | Peter Bell 0001 | Google Scholar   DBLP
VenuesInterspeech: 24ICASSP: 8
Years2022: 42021: 82020: 72019: 52018: 12017: 42016: 3
ISCA Sectionfeature extraction and distant asr: 3acoustic model adaptation: 2cross/multi-lingual asr: 1robust speaker recognition: 1spoken dialogue systems: 1linguistic components in end-to-end asr: 1topics in asr: 1embedding and network architecture for speaker recognition: 1spoken language processing: 1self-supervision and semi-supervision for neural asr training: 1neural network training methods for asr: 1asr model training and strategies: 1asr neural network training: 1model training for asr: 1feature extraction for asr: 1asr neural network architectures: 1show & tell: 1multi-lingual models and adaptation for asr: 1spoken document processing: 1language recognition: 1language model adaptation: 1
IEEE Keywordspeech recognition: 7natural language processing: 3sensor fusion: 2acoustic modelling: 2decoding: 2emotion recognition: 1multi task learning: 1speech emotion recognition: 1automatic speech recognition: 1wav2vec 2.0: 1raw phase spectrum: 1asr: 1multi head cnns: 1phase based source filter separation: 1top down training: 1recurrent neural nets: 1general classifier: 1layer wise training: 1language model: 1domain adaptation: 1multilingual speech recognition: 1adversarial learning: 1domain adversarial training: 1speaker verification: 1diarization: 1deep neural network: 1speaker recognition: 1signal representation: 1convolutional neural nets: 1computer vision: 1signal resolution: 1low pass filters: 1end to end: 1attention: 1language translation: 1punctuation: 1encoding: 1neural machine translation: 1rich transcription: 1
Most Publications2021: 172020: 172019: 162022: 122023: 10

Affiliations
University of Edinburgh, Centre for Speech Technology Research, UK

ICASSP2022 Yuanchao Li, Peter Bell 0001, Catherine Lai, 
Fusing ASR Outputs in Joint Training for Speech Emotion Recognition.

Interspeech2022 Ondrej Klejch, Electra Wallington, Peter Bell 0001
Deciphering Speech: a Zero-Resource Approach to Cross-Lingual Transfer in ASR.

Interspeech2022 Chau Luu, Steve Renals, Peter Bell 0001
Investigating the contribution of speaker attributes to speaker separability using disentangled speaker representations.

Interspeech2022 Sarenne Carrol Wallbridge, Catherine Lai, Peter Bell 0001
Investigating perception of spoken dialogue acceptability through surprisal.

ICASSP2021 Erfan Loweimi, Zoran Cvetkovic, Peter Bell 0001, Steve Renals, 
Speech Acoustic Modelling from Raw Phase Spectrum.

ICASSP2021 Shucong Zhang, Cong-Thanh Do, Rama Doddipatla, Erfan Loweimi, Peter Bell 0001, Steve Renals, 
Train Your Classifier First: Cascade Neural Networks Training from Upper Layers to Lower Layers.

Interspeech2021 Ondrej Klejch, Electra Wallington, Peter Bell 0001
The CSTR System for Multilingual and Code-Switching ASR Challenges for Low Resource Indian Languages.

Interspeech2021 Erfan Loweimi, Zoran Cvetkovic, Peter Bell 0001, Steve Renals, 
Speech Acoustic Modelling Using Raw Source and Filter Components.

Interspeech2021 Chau Luu, Peter Bell 0001, Steve Renals, 
Leveraging Speaker Attribute Information Using Multi Task Learning for Speaker Verification and Diarization.

Interspeech2021 Sarenne Wallbridge, Peter Bell 0001, Catherine Lai, 
It's Not What You Said, it's How You Said it: Discriminative Perception of Speech as a Multichannel Communication System.

Interspeech2021 Electra Wallington, Benji Kershenbaum, Ondrej Klejch, Peter Bell 0001
On the Learning Dynamics of Semi-Supervised Training for ASR.

Interspeech2021 Shucong Zhang, Erfan Loweimi, Peter Bell 0001, Steve Renals, 
Stochastic Attention Head Removal: A Simple and Effective Method for Improving Transformer Based ASR Models.

ICASSP2020 Alberto Abad, Peter Bell 0001, Andrea Carmantini, Steve Renals, 
Cross Lingual Transfer Learning for Zero-Resource Domain Adaptation.

ICASSP2020 Chau Luu, Peter Bell 0001, Steve Renals, 
Channel Adversarial Training for Speaker Verification and Diarization.

ICASSP2020 Joanna Rownicka, Peter Bell 0001, Steve Renals, 
Multi-Scale Octave Convolutions for Robust Speech Recognition.

Interspeech2020 Neethu M. Joy, Dino Oglic, Zoran Cvetkovic, Peter Bell 0001, Steve Renals, 
Deep Scattering Power Spectrum Features for Robust Speech Recognition.

Interspeech2020 Erfan Loweimi, Peter Bell 0001, Steve Renals, 
On the Robustness and Training Dynamics of Raw Waveform Models.

Interspeech2020 Erfan Loweimi, Peter Bell 0001, Steve Renals, 
Raw Sign and Magnitude Spectra for Multi-Head Acoustic Modelling.

Interspeech2020 Dino Oglic, Zoran Cvetkovic, Peter Bell 0001, Steve Renals, 
A Deep 2D Convolutional Network for Waveform-Based Speech Recognition.

ICASSP2019 Shucong Zhang, Erfan Loweimi, Peter Bell 0001, Steve Renals, 
Windowed Attention Mechanisms for Speech Recognition.

#110  | Yuexian Zou | Google Scholar   DBLP
VenuesInterspeech: 18ICASSP: 12AAAI: 1IJCAI: 1
Years2022: 82021: 132020: 72019: 22018: 2
ISCA Sectionacoustic event detection and classification: 3spoken dialogue systems: 3source separation: 2acoustic event detection: 2multi-, cross-lingual and other topics in asr: 1spoken term detection & voice search: 1acoustic event detection and acoustic scene classification: 1speech signal analysis and representation: 1speaker embedding: 1speech enhancement: 1the interspeech 2018 computational paralinguistics challenge (compare): 1source separation and spatial analysis: 1
IEEE Keywordnatural language processing: 3audio signal processing: 3supervised learning: 2multiple instance learning: 2text analysis: 2audio tagging: 2image segmentation: 2filtering theory: 2query processing: 1slot filling: 1multiple intent detection: 1self distillation: 1transductive inference: 1few shot learning: 1sound event detection: 1mutual learning: 1deep learning (artificial intelligence): 1signal detection: 1multi granularity representation: 1machine reading comprehension: 1question answering: 1aspect based sentiment information: 1bert: 1multitask learning: 1interactive systems: 1iteratively co interactive network: 1spoken language understanding: 1two stream framework: 1class wise attentional clips: 1weak labels: 1temporal modeling: 1image motion analysis: 1motion representation: 1image sequences: 1image recognition: 1image representation: 1video signal processing: 1action recognition: 1video understanding: 1object recognition: 1speaker recognition: 1unsupervised learning: 1speaker verification: 1contrastive learning: 1self supervised learning: 1two stage segmentation: 1self attention: 1curve text: 1scene text detection: 1text detection: 1multi channel speech separation: 1spatial features: 1end to end: 1speech enhancement: 1spatial filters: 1inter channel convolution differences: 1reverberation: 1speech recognition: 1convolutional neural nets: 1object detection: 1spatial attention: 1channel wise attention: 1weakly labelled data: 1realistic images: 1image resolution: 1semantic information preserved loss: 1s2pt: 1structural correlations between images textures: 1image texture: 1generative adversarial networks: 1correlated residual blocks: 1probability: 1audio visual systems: 1particle flow: 1particle filtering (numerical methods): 1object tracking: 1monte carlo methods: 1pattern clustering: 1smc phd filter: 1audio visual tracking: 1
Most Publications2021: 592022: 542020: 342019: 312023: 24

Affiliations
URLs

ICASSP2022 Lisong Chen, Peilin Zhou, Yuexian Zou
Joint Multiple Intent Detection and Slot Filling Via Self-Distillation.

ICASSP2022 Dongchao Yang, Helin Wang, Yuexian Zou, Zhongjie Ye, Wenwu Wang 0001, 
A Mutual Learning Framework for Few-Shot Sound Event Detection.

Interspeech2022 Jinchuan Tian, Jianwei Yu, Chunlei Zhang, Yuexian Zou, Dong Yu 0001, 
LAE: Language-Aware Encoder for Monolingual and Multilingual ASR.

Interspeech2022 Helin Wang, Dongchao Yang, Chao Weng, Jianwei Yu, Yuexian Zou
Improving Target Sound Extraction with Timestamp Information.

Interspeech2022 Yifei Xin, Dongchao Yang, Yuexian Zou
Audio Pyramid Transformer with Domain Adaption for Weakly Supervised Sound Event Detection and Audio Classification.

Interspeech2022 Dongchao Yang, Helin Wang, Zhongjie Ye, Yuexian Zou, Wenwu Wang 0001, 
RaDur: A Reference-aware and Duration-robust Network for Target Sound Detection.

Interspeech2022 Zifeng Zhao, Rongzhi Gu, Dongchao Yang, Jinchuan Tian, Yuexian Zou
Speaker-Aware Mixture of Mixtures Training for Weakly Supervised Speaker Extraction.

Interspeech2022 Zifeng Zhao, Dongchao Yang, Rongzhi Gu, Haoran Zhang, Yuexian Zou
Target Confusion in End-to-end Speaker Extraction: Analysis and Approaches.

ICASSP2021 Nuo Chen, Fenglin Liu, Chenyu You, Peilin Zhou, Yuexian Zou
Adaptive Bi-Directional Attention: Exploring Multi-Granularity Representations for Machine Reading Comprehension.

ICASSP2021 Zhiqi Huang, Fenglin Liu, Peilin Zhou, Yuexian Zou
Sentiment Injected Iteratively Co-Interactive Network for Spoken Language Understanding.

ICASSP2021 Helin Wang, Yuexian Zou, Wenwu Wang 0001, 
A Global-Local Attention Framework for Weakly Labelled Audio Tagging.

ICASSP2021 Liyu Wu, Yuexian Zou, Can Zhang 0001, 
Long-Short Temporal Modeling for Efficient Action Recognition.

ICASSP2021 Haoran Zhang, Yuexian Zou, Helin Wang, 
Contrastive Self-Supervised Learning for Text-Independent Speaker Verification.

Interspeech2021 Nuo Chen, Chenyu You, Yuexian Zou
Self-Supervised Dialogue Learning for Spoken Conversational Question Answering.

Interspeech2021 Li Wang, Rongzhi Gu, Nuo Chen, Yuexian Zou
Text Anchor Based Metric Learning for Small-Footprint Keyword Spotting.

Interspeech2021 Helin Wang, Yuexian Zou, Wenwu Wang 0001, 
SpecAugment++: A Hidden Space Data Augmentation Method for Acoustic Scene Classification.

Interspeech2021 Weiyuan Xu, Peilin Zhou, Chenyu You, Yuexian Zou
Semantic Transportation Prototypical Network for Few-Shot Intent Detection.

Interspeech2021 Dongchao Yang, Helin Wang, Yuexian Zou
Unsupervised Multi-Target Domain Adaptation for Acoustic Scene Classification.

Interspeech2021 Chenyu You, Nuo Chen, Yuexian Zou
Contextualized Attention-Based Knowledge Transfer for Spoken Conversational Question Answering.

AAAI2021 Zhiqi Huang, Fenglin Liu, Xian Wu, Shen Ge, Helin Wang, Wei Fan 0001, Yuexian Zou
Audio-Oriented Multimodal Machine Comprehension via Dynamic Inter- and Intra-modality Attention.

#111  | Jen-Tzung Chien | Google Scholar   DBLP
VenuesInterspeech: 14ICASSP: 11TASLP: 6
Years2022: 32021: 32020: 42019: 62018: 42017: 52016: 6
ISCA Sectionspeaker recognition: 2neural network acoustic models for asr: 2spoken language processing: 1novel neural network architectures for asr: 1spoken dialogue systems: 1neural networks for language modeling: 1spoken dialogue system: 1dialogue speech understanding: 1semantic analysis and classification: 1source separation and auditory scene analysis: 1acoustic modeling with neural networks: 1source separation and spatial audio: 1
IEEE Keywordspeaker recognition: 6natural language processing: 5recurrent neural nets: 5source separation: 4speaker verification: 4i vectors: 4speech recognition: 3variational autoencoder: 3domain adaptation: 3bayes methods: 3recurrent neural network: 2backpropagation: 2speech enhancement: 2stochastic processes: 2markov processes: 2hierarchical model: 2maximum mean discrepancy: 2regression analysis: 2mixture of plda: 2pattern clustering: 2policy optimization: 1optimisation: 1reinforcement learning: 1pattern classification: 1document representation: 1natural language understanding: 1document handling: 1data augmentation: 1language translation: 1mask language model: 1adversarial learning: 1sequential learning: 1transformer: 1minimax techniques: 1normalizing flow: 1autoregressive processes: 1dialogue generation: 1interactive systems: 1mutual information: 1domain adversarial training: 1gaussian distribution: 1speaker verification (sv): 1speech intelligibility: 1markov state: 1latent variable model: 1stochastic transition: 1deep sequential learning: 1sequence generation: 1image representation: 1x vectors: 1nonparametric statistics: 1topic model: 1trees (mathematics): 1bayesian nonparametrics (bnps): 1text analysis: 1sparse model: 1catering industry: 1text mining: 1speech coding: 1statistical distributions: 1factorized error backpropagation: 1speech dereverberation: 1matrix algebra: 1spectro temporal neural factorization: 1reverberation: 1sequences: 1recall neural network: 1long short term memory: 1sequence to sequence learning: 1turing machines: 1probability: 1deep neural networks: 1variational manifold learning: 1probabilistic linear discriminant analysis: 1sampling methods: 1gradient methods: 1bayesian learning: 1variational techniques: 1inference mechanisms: 1poisson distribution: 1model complexity: 1monaural source separation: 1matrix decomposition: 1nonnegative matrix factorization: 1mixture models: 1noise robustness: 1probabilistic lda: 1computational linguistics: 1pathological speech: 1acoustical analysis: 1objective assessment: 1automatic speech recognition: 1discriminative learning: 1signal reconstruction: 1monaural speech separation: 1neural network: 1
Most Publications2020: 212018: 172021: 162019: 162016: 15


ICASSP2022 Chang-Ting Chu, Mahdin Rohmatillah, Ching-Hsien Lee, Jen-Tzung Chien
Augmentation Strategy Optimization for Language Understanding.

ICASSP2022 Hou Lio, Shang-En Li, Jen-Tzung Chien
Adversarial Mask Transformer for Sequential Learning.

Interspeech2022 Jen-Tzung Chien, Yu-Han Huang, 
Bayesian Transformer Using Disentangled Mask Attention.

ICASSP2021 Tien-Ching Luo, Jen-Tzung Chien
Variational Dialogue Generation with Normalizing Flows.

Interspeech2021 Chi-Hang Leong, Yu-Han Huang, Jen-Tzung Chien
Online Compressive Transformer for End-to-End Speech Recognition.

Interspeech2021 Mahdin Rohmatillah, Jen-Tzung Chien
Causal Confusion Reduction for Robust Multi-Domain Dialogue Policy.

TASLP2020 Youzhi Tu, Man-Wai Mak, Jen-Tzung Chien
Variational Domain Adversarial Learning With Mutual Information Maximization for Speaker Verification.

Interspeech2020 Jen-Tzung Chien, Yu-Min Huang, 
Stochastic Convolutional Recurrent Networks for Language Modeling.

Interspeech2020 Jen-Tzung Chien, Po-Chien Hsu, 
Stochastic Curiosity Exploration for Dialogue Systems.

Interspeech2020 Weiwei Lin 0002, Man-Wai Mak, Jen-Tzung Chien
Strategies for End-to-End Text-Independent Speaker Verification.

ICASSP2019 Jen-Tzung Chien, Che-Yu Kuo, 
Stochastic Markov Recurrent Neural Network for Source Separation.

ICASSP2019 Jen-Tzung Chien, Chun-Wei Wang, 
Variational and Hierarchical Recurrent Autoencoder.

ICASSP2019 Wei-Wei Lin 0002, Man-Wai Mak, Youzhi Tu, Jen-Tzung Chien
Semi-supervised Nuisance-attribute Networks for Domain Adaptation.

Interspeech2019 Jen-Tzung Chien, Wei Xiang Lieow, 
Meta Learning for Hyperparameter Optimization in Dialogue System.

Interspeech2019 Jen-Tzung Chien, Chun-Wei Wang, 
Self Attention in Variational Sequential Learning for Summarization.

Interspeech2019 Youzhi Tu, Man-Wai Mak, Jen-Tzung Chien
Variational Domain Adversarial Learning for Speaker Verification.

TASLP2018 Jen-Tzung Chien
Bayesian Nonparametric Learning for Hierarchical and Sparse Topics.

TASLP2018 Wei-Wei Lin 0002, Man-Wai Mak, Jen-Tzung Chien
Multisource I-Vectors Domain Adaptation Using Maximum Mean Discrepancy Based Autoencoders.

ICASSP2018 Jen-Tzung Chien, Kuan-Ting Kuo, 
Spectro-Temporal Neural Factorization for Speech Dereverberation.

ICASSP2018 Jen-Tzung Chien, Kai-Wei Tsou, 
Recall Neural Network for Source Separation.

#112  | Abeer Alwan | Google Scholar   DBLP
VenuesInterspeech: 25ICASSP: 4SpeechComm: 2
Years2022: 102021: 22020: 42019: 42018: 42017: 32016: 4
ISCA Sectionspeaking styles and interaction styles: 2low-resource asr development: 1inclusive and fair speech technologies: 1speech and language in health: 1multimodal speech emotion recognition and paralinguistics: 1non-autoregressive sequential modeling for speech processing: 1topics in asr: 1speaker recognition: 1computational paralinguistics: 1large-scale evaluation of short-duration speaker verification: 1summarization, semantic analysis and classification: 1the interspeech 2019 computational paralinguistics challenge (compare): 1spoken language processing for children’s speech: 1integrating speech science and technology for clinical applications: 1acoustic modelling: 1the interspeech 2018 computational paralinguistics challenge (compare): 1applications in education and learning: 1robust speaker recognition: 1speech and audio segmentation and classification: 1short utterances speaker recognition: 1speaker diarization and recognition: 1special session: 1spoken term detection: 1prosody, phonation and voice quality: 1
IEEE Keywordspeech recognition: 3data augmentation: 2hidden markov models: 1dialect robust asr: 1children’s speech: 1low resource asr: 1linear predictive coding: 1african american english: 1depression detection: 1x vector: 1frame rate: 1time frequency resolution: 1task augmentation: 1child asr: 1meta initialization: 1computer aided instruction: 1kindergarten aged asr: 1speaker perception: 1automatic speaker verification: 1cepstral analysis: 1decision making: 1sensor fusion: 1speaker discrimination: 1voice quality: 1speaker recognition: 1
Most Publications2022: 192013: 112006: 112010: 102009: 10

Affiliations
University of California, Los Angeles, USA

SpeechComm2022 Gary Yeung, Ruchao Fan, Abeer Alwan
Fundamental frequency feature warping for frequency normalization and data augmentation in child automatic speech recognition.

ICASSP2022 Alexander Johnson, Ruchao Fan, Robin Morris, Abeer Alwan
LPC Augment: an LPC-based ASR Data Augmentation Algorithm for Low and Zero-Resource Children's Dialects.

ICASSP2022 Vijay Ravi, Jinhan Wang, Jonathan Flint, Abeer Alwan
Fraug: A Frame Rate Based Data Augmentation Method for Depression Detection from Speech Signals.

ICASSP2022 Yunzheng Zhu, Ruchao Fan, Abeer Alwan
Towards Better Meta-Initialization with Task Augmentation for Kindergarten-Aged Speech Recognition.

Interspeech2022 Amber Afshan, Abeer Alwan
Attention-based conditioning methods using variable frame rate for style-robust speaker verification.

Interspeech2022 Amber Afshan, Abeer Alwan
Learning from human perception to improve automatic speaker verification in style-mismatched conditions.

Interspeech2022 Ruchao Fan, Abeer Alwan
DRAFT: A Novel Framework to Reduce Domain Shifting in Self-supervised Learning and Its Application to Children's ASR.

Interspeech2022 Alexander Johnson, Kevin Everson, Vijay Ravi, Anissa Gladney, Mari Ostendorf, Abeer Alwan
Automatic Dialect Density Estimation for African American English.

Interspeech2022 Vijay Ravi, Jinhan Wang, Jonathan Flint, Abeer Alwan
A Step Towards Preserving Speakers' Identity While Detecting Depression Via Speaker Disentanglement.

Interspeech2022 Jinhan Wang, Vijay Ravi, Jonathan Flint, Abeer Alwan
Unsupervised Instance Discriminative Learning for Depression Detection from Speech Signals.

Interspeech2021 Ruchao Fan, Wei Chu, Peng Chang 0002, Jing Xiao 0006, Abeer Alwan
An Improved Single Step Non-Autoregressive Transformer for Automatic Speech Recognition.

Interspeech2021 Jinhan Wang, Yunzheng Zhu, Ruchao Fan, Wei Chu, Abeer Alwan
Low Resource German ASR with Untranscribed Data Spoken by Non-Native Children - INTERSPEECH 2021 Shared Task SPAPL System.

Interspeech2020 Amber Afshan, Jinxi Guo, Soo Jin Park, Vijay Ravi, Alan McCree, Abeer Alwan
Variable Frame Rate-Based Data Augmentation to Handle Speaking-Style Variability for Automatic Speaker Verification.

Interspeech2020 Amber Afshan, Jody Kreiman, Abeer Alwan
Speaker Discrimination in Humans and Machines: Effects of Speaking Style Variability.

Interspeech2020 Vijay Ravi, Ruchao Fan, Amber Afshan, Huanhua Lu, Abeer Alwan
Exploring the Use of an Unsupervised Autoregressive Model as a Shared Encoder for Text-Dependent Speaker Verification.

Interspeech2020 Trang Tran, Morgan Tinkler, Gary Yeung, Abeer Alwan, Mari Ostendorf, 
Analysis of Disfluency in Children's Speech.

SpeechComm2019 Jinxi Guo, Ning Xu 0010, Kailun Qian, Yang Shi, Kaiyuan Xu, Yingnian Wu, Abeer Alwan
Deep neural network based i-vector mapping for speaker verification using short utterances.

ICASSP2019 Soo Jin Park, Amber Afshan, Jody Kreiman, Gary Yeung, Abeer Alwan
Target and Non-target Speaker Discrimination by Humans and Machines.

Interspeech2019 Vijay Ravi, Soo Jin Park, Amber Afshan, Abeer Alwan
Voice Quality and Between-Frame Entropy for Sleepiness Estimation.

Interspeech2019 Gary Yeung, Abeer Alwan
A Frequency Normalization Technique for Kindergarten Speech Recognition Inspired by the Role of fo in Vowel Perception.

#113  | Sheng Li 0010 | Google Scholar   DBLP
VenuesInterspeech: 19ICASSP: 11TASLP: 1
Years2022: 92021: 42020: 52019: 52018: 52017: 22016: 1
ISCA Sectionmulti-, cross-lingual and other topics in asr: 1speech quality assessment: 1speech representation: 1zero, low-resource and multi-modal speech recognition: 1dereverberation, noise reduction, and speaker extraction: 1other topics in speech recognition: 1speech enhancement and intelligibility: 1source separation: 1oriental language recognition: 1speech and voice disorders: 1single-channel speech enhancement: 1cross-lingual and multilingual asr: 1spoken term detection, confidence measure, and end-to-end speech recognition: 1nn architectures for asr: 1speech and audio classification: 1acoustic modelling: 1audio events and acoustic scenes: 1language identification: 1speaker and language recognition applications: 1
IEEE Keywordspeech recognition: 10acoustic model: 4knowledge distillation: 3transformer: 2spoken language identification: 2connectionist temporal classification: 2task driven loss: 1feature distillation: 1model compression: 1weighted loss: 1speech separation: 1dynamic mixing: 1hard sample mining: 1speaker recognition: 1data imbalance: 1multi task learning: 1modeling units: 1natural language processing: 1mandarin speech recognition: 1ctc/attention: 1hearing: 1convolutional neural network: 1voice activity detection: 1auditory encoder: 1ear: 1short utterances: 1internal representation learning: 1medical signal processing: 1end to end model: 1dysarthric speech recognition: 1articulatory attribute detection: 1two stage: 1time frequency analysis: 1speech dereverberation: 1multi target learning: 1reverberation: 1spectrograms fusion: 1interactive teacher student learning: 1computer aided instruction: 1teacher model optimization: 1short utterance feature representation: 1natural languages: 1recurrent neural nets: 1long short term memory: 1hidden markov models: 1signal classification: 1conditional entropy: 1entropy: 1loss function: 1semi supervised training: 1dnn: 1lecture transcription: 1pattern classification: 1unsupervised learning: 1unsupervised training: 1
Most Publications2022: 212021: 132020: 132019: 92018: 6

Affiliations
National Institute of Information and Communications Technology (NICT), Universal Communication Research Institute (UCRI), Kyoto, Japan
Kyoto University, Graduate School of Informatics, Japan (2012-2017, PhD 2016)
Shenzhen Institutes of Advanced Technology, Shenzhen, China (2008-2012)
Chinese Academy of Sciences, Beijing, China (2008-2012)
Chinese University of Hong Kong, Hong Kong (2008-2012)
Nanjing University, China (2002-2009)

ICASSP2022 Yongjie Lv, Longbiao Wang, Meng Ge, Sheng Li 0010, Chenchen Ding, Lixin Pan, Yuguang Wang, Jianwu Dang 0001, Kiyoshi Honda, 
Compressing Transformer-Based ASR Model by Task-Driven Loss and Attention-Based Multi-Level Feature Distillation.

ICASSP2022 Kai Wang, Yizhou Peng, Hao Huang, Ying Hu, Sheng Li 0010
Mining Hard Samples Locally And Globally For Improved Speech Separation.

Interspeech2022 Soky Kak, Sheng Li 0010, Masato Mimura, Chenhui Chu, Tatsuya Kawahara, 
Leveraging Simultaneous Translation for Enhancing Transcription of Low-resource Language via Cross Attention Mechanism.

Interspeech2022 Kai Li 0018, Sheng Li 0010, Xugang Lu, Masato Akagi, Meng Liu, Lin Zhang, Chang Zeng, Longbiao Wang, Jianwu Dang 0001, Masashi Unoki, 
Data Augmentation Using McAdams-Coefficient-Based Speaker Anonymization for Fake Audio Detection.

Interspeech2022 Nan Li, Meng Ge, Longbiao Wang, Masashi Unoki, Sheng Li 0010, Jianwu Dang 0001, 
Global Signal-to-noise Ratio Estimation Based on Multi-subband Processing Using Convolutional Neural Network.

Interspeech2022 Siqing Qin, Longbiao Wang, Sheng Li 0010, Yuqin Lin, Jianwu Dang 0001, 
Finer-grained Modeling units-based Meta-Learning for Low-resource Tibetan Speech Recognition.

Interspeech2022 Hao Shi, Longbiao Wang, Sheng Li 0010, Jianwu Dang 0001, Tatsuya Kawahara, 
Monaural Speech Enhancement Based on Spectrogram Decomposition for Convolutional Neural Network-sensitive Feature Extraction.

Interspeech2022 Longfei Yang, Wenqing Wei, Sheng Li 0010, Jiyi Li, Takahiro Shinozaki, 
Augmented Adversarial Self-Supervised Learning for Early-Stage Alzheimer's Speech Detection.

Interspeech2022 Zhengdong Yang, Wangjin Zhou, Chenhui Chu, Sheng Li 0010, Raj Dabre, Raphael Rubino, Yi Zhao, 
Fusion of Self-supervised Learned Models for MOS Prediction.

ICASSP2021 Shunfei Chen, Xinhui Hu, Sheng Li 0010, Xinkang Xu, 
An Investigation of Using Hybrid Modeling Units for Improving End-to-End Speech Recognition System.

ICASSP2021 Nan Li, Longbiao Wang, Masashi Unoki, Sheng Li 0010, Rui Wang, Meng Ge, Jianwu Dang 0001, 
Robust Voice Activity Detection Using a Masked Auditory Encoder Based Convolutional Neural Network.

Interspeech2021 Kai Wang, Hao Huang, Ying Hu, Zhihua Huang, Sheng Li 0010
End-to-End Speech Separation Using Orthogonal Representation in Complex and Real Time-Frequency Domain.

Interspeech2021 Ding Wang, Shuaishuai Ye, Xinhui Hu, Sheng Li 0010, Xinkang Xu, 
An End-to-End Dialect Identification System with Transfer Learning from a Multilingual Automatic Speech Recognition Model.

TASLP2020 Peng Shen, Xugang Lu, Sheng Li 0010, Hisashi Kawai, 
Knowledge Distillation-Based Representation Learning for Short-Utterance Spoken Language Identification.

ICASSP2020 Yuqin Lin, Longbiao Wang, Jianwu Dang 0001, Sheng Li 0010, Chenchen Ding, 
End-to-End Articulatory Modeling for Dysarthric Articulatory Attribute Detection.

ICASSP2020 Hao Shi, Longbiao Wang, Meng Ge, Sheng Li 0010, Jianwu Dang 0001, 
Spectrograms Fusion with Minimum Difference Masks Estimation for Monaural Speech Dereverberation.

Interspeech2020 Yuqin Lin, Longbiao Wang, Sheng Li 0010, Jianwu Dang 0001, Chenchen Ding, 
Staged Knowledge Distillation for End-to-End Dysarthric Speech Recognition and Speech Attribute Transcription.

Interspeech2020 Hao Shi, Longbiao Wang, Sheng Li 0010, Chenchen Ding, Meng Ge, Nan Li, Jianwu Dang 0001, Hiroshi Seki, 
Singing Voice Extraction with Attention-Based Spectrograms Fusion.

ICASSP2019 Peng Shen, Xugang Lu, Sheng Li 0010, Hisashi Kawai, 
Interactive Learning of Teacher-student Model for Short Utterance Spoken Language Identification.

Interspeech2019 Sheng Li 0010, Chenchen Ding, Xugang Lu, Peng Shen, Tatsuya Kawahara, Hisashi Kawai, 
End-to-End Articulatory Attribute Modeling for Low-Resource Multilingual Speech Recognition.

#114  | Panayiotis G. Georgiou | Google Scholar   DBLP
VenuesInterspeech: 26ICASSP: 5
Years2021: 12020: 22019: 82018: 52017: 62016: 9
ISCA Sectionbehavioral signal processing and speaker state and traits analytics: 5speaker states and traits: 3speech and language analytics for mental health: 2speaker diarization: 2speech recognition of atypical speech: 1speech and language analytics for medical applications: 1the voices from a distance challenge: 1the second dihard speech diarization challenge (dihard ii): 1speaker recognition and diarization: 1dialogue speech understanding: 1topics in speech and audio signal processing: 1speaker verification: 1speech pathology, depression, and medical applications: 1disorders related to speech and language: 1automatic assessment of emotions: 1speaker diarization and recognition: 1special session: 1speech enhancement and noise reduction: 1
IEEE Keywordbehavioural sciences computing: 2speech recognition: 2speaker recognition: 2asr: 1behavior: 1couples conversations: 1military computing: 1psychology: 1prosody: 1suicidal risk: 1emotion recognition: 1signal classification: 1signal representation: 1affective computing: 1adversarial training: 1affective representation: 1speaker invariant: 1speech emotion recognition: 1entropy: 1speaker role recognition: 1language model: 1lattice rescoring: 1regression analysis: 1medical signal processing: 1stress: 1physiology: 1speech: 1heart rate: 1respiratory sinus arrhythmia: 1cardiology: 1behavior representation: 1information retrieval: 1manifold learning: 1behavior signal processing: 1unsupervised learning: 1
Most Publications2019: 232018: 172013: 172011: 172016: 13

Affiliations
University of Southern California, Los Angeles, CA, USA

Interspeech2021 Vikramjit Mitra, Zifang Huang, Colin Lea, Lauren Tooley, Sarah Wu, Darren Botten, Ashwini Palekar, Shrinath Thelapurath, Panayiotis G. Georgiou, Sachin Kajarekar, Jeffrey P. Bigham, 
Analysis and Tuning of a Voice Assistant System for Dysfluent Speech.

ICASSP2020 Sandeep Nallan Chakravarthula, Md. Nasir, Shao-Yen Tseng, Haoqi Li, Tae Jin Park, Brian R. Baucom, Craig J. Bryan, Shrikanth Narayanan, Panayiotis G. Georgiou
Automatic Prediction of Suicidal Risk in Military Couples Using Multimodal Interaction Cues from Couples Conversations.

ICASSP2020 Haoqi Li, Ming Tu, Jing Huang 0019, Shrikanth Narayanan, Panayiotis G. Georgiou
Speaker-Invariant Affective Representation Learning via Adversarial Training.

ICASSP2019 Nikolaos Flemotomos, Panayiotis G. Georgiou, David C. Atkins, Shrikanth S. Narayanan, 
Role Specific Lattice Rescoring for Speaker Role Recognition from Speech Recognition Outputs.

Interspeech2019 Sandeep Nallan Chakravarthula, Haoqi Li, Shao-Yen Tseng, Maija Reblin, Panayiotis G. Georgiou
Predicting Behavior in Cancer-Afflicted Patient and Spouse Interactions Using Speech and Language.

Interspeech2019 Arindam Jati, Raghuveer Peri, Monisankha Pal, Tae Jin Park, Naveen Kumar 0004, Ruchir Travadi, Panayiotis G. Georgiou, Shrikanth Narayanan, 
Multi-Task Discriminative Training of Hybrid DNN-TVM Model for Speaker Verification with Noisy and Far-Field Speech.

Interspeech2019 Md. Nasir, Sandeep Nallan Chakravarthula, Brian R. W. Baucom, David C. Atkins, Panayiotis G. Georgiou, Shrikanth Narayanan, 
Modeling Interpersonal Linguistic Coordination in Conversations Using Word Mover's Distance.

Interspeech2019 Tae Jin Park, Manoj Kumar 0007, Nikolaos Flemotomos, Monisankha Pal, Raghuveer Peri, Rimita Lahiri, Panayiotis G. Georgiou, Shrikanth Narayanan, 
The Second DIHARD Challenge: System Description for USC-SAIL Team.

Interspeech2019 Tae Jin Park, Kyu J. Han, Jing Huang 0019, Xiaodong He 0001, Bowen Zhou, Panayiotis G. Georgiou, Shrikanth Narayanan, 
Speaker Diarization with Lexical Information.

Interspeech2019 Prashanth Gurunath Shivakumar, Mu Yang, Panayiotis G. Georgiou
Spoken Language Intent Detection Using Confusion2Vec.

Interspeech2019 Krishna Somandepalli, Naveen Kumar 0004, Arindam Jati, Panayiotis G. Georgiou, Shrikanth Narayanan, 
Multiview Shared Subspace Learning Across Speakers and Speech Commands.

ICASSP2018 Arindam Jati, Paula G. Williams, Brian R. Baucom, Panayiotis G. Georgiou
Towards Predicting Physiology from Speech During Stressful Conversations: Heart Rate and Respiratory Sinus Arrhythmia.

Interspeech2018 Sandeep Nallan Chakravarthula, Brian R. Baucom, Panayiotis G. Georgiou
Modeling Interpersonal Influence of Verbal Behavior in Couples Therapy Dyadic Interactions.

Interspeech2018 Arindam Jati, Panayiotis G. Georgiou
An Unsupervised Neural Prediction Framework for Learning Speaker Embeddings Using Recurrent Neural Networks.

Interspeech2018 Md. Nasir, Brian R. Baucom, Shrikanth S. Narayanan, Panayiotis G. Georgiou
Towards an Unsupervised Entrainment Distance in Conversational Speech Using Deep Neural Networks.

Interspeech2018 Tae Jin Park, Panayiotis G. Georgiou
Multimodal Speaker Segmentation and Diarization Using Lexical and Acoustic Cues via Sequence to Sequence Neural Networks.

ICASSP2017 Haoqi Li, Brian R. Baucom, Panayiotis G. Georgiou
Unsupervised latent behavior manifold learning from acoustic features: Audio2behavior.

Interspeech2017 James Gibson, Dogan Can, Panayiotis G. Georgiou, David C. Atkins, Shrikanth S. Narayanan, 
Attention Networks for Modeling Behaviors in Addiction Counseling.

Interspeech2017 Arindam Jati, Panayiotis G. Georgiou
Speaker2Vec: Unsupervised Learning and Adaptation of a Speaker Manifold Using Deep Neural Networks with an Evaluation on Speaker Segmentation.

Interspeech2017 Karel Mundnich, Md. Nasir, Panayiotis G. Georgiou, Shrikanth S. Narayanan, 
Exploiting Intra-Annotator Rating Consistency Through Copeland's Method for Estimation of Ground Truth Labels in Couples' Therapy.

#115  | Nicholas Cummins | Google Scholar   DBLP
VenuesInterspeech: 27ICASSP: 2TASLP: 1SpeechComm: 1
Years2022: 22021: 32020: 72019: 52018: 42017: 82016: 2
ISCA Sectionspeech in health: 3special session: 3speech and language in health: 2speech emotion recognition: 2attention mechanism for speaker state recognition: 2speaker states and traits: 2diverse modes of speech acquisition and processing: 1alzheimer’s dementia recognition through spontaneous speech: 1social signals detection and speaker traits analysis: 1training strategy for speech emotion recognition: 1speech signal characterization: 1speech and language analytics for mental health: 1text analysis, multilingual issues and evaluation in speech synthesis: 1speech pathology, depression, and medical applications: 1speaker state and trait: 1show & tell: 1disorders related to speech and language: 1pathological speech and language: 1automatic assessment of emotions: 1
IEEE Keywordemotion recognition: 2signal representation: 1disentangled representation learning: 1guided representation learning: 1audio generation: 1audio signal processing: 1and generative adversarial neural network: 1electroencephalography: 1medical signal processing: 1recurrent neural nets: 1signal classification: 1human computer interaction: 1eeg signals: 1temporal convolutional networks: 1hierarchical attention mechanism: 1depression: 1mean square error methods: 1attention transfer: 1psychology: 1hierarchical attention: 1monotonic attention: 1behavioural sciences computing: 1speech recognition: 1
Most Publications2017: 272018: 222019: 182020: 152021: 11


Interspeech2022 Salvatore Fara, Stefano Goria, Emilia Molimpakis, Nicholas Cummins
Speech and the n-Back task as a lens into depression. How combining both may allow us to isolate different core symptoms of depression.

Interspeech2022 Bahman Mirheidari, André Bittar, Nicholas Cummins, Johnny Downs, Helen L. Fisher, Heidi Christensen, 
Automatic Detection of Expressed Emotion from Five-Minute Speech Samples: Challenges and Opportunities.

TASLP2021 Kazi Nazmul Haque, Rajib Rana, Jiajun Liu, John H. L. Hansen, Nicholas Cummins, Carlos Busso, Björn W. Schuller, 
Guided Generative Adversarial Neural Network for Representation Learning and Audio Generation Using Fewer Labelled Audio Data.

ICASSP2021 Chao Li, Boyang Chen, Ziping Zhao 0001, Nicholas Cummins, Björn W. Schuller, 
Hierarchical Attention-Based Temporal Convolutional Networks for Eeg-Based Emotion Recognition.

Interspeech2021 Judith Dineley, Grace Lavelle, Daniel Leightley, Faith Matcham, Sara Siddi, Maria Teresa Peñarrubia-María, Katie M. White, Alina Ivan, Carolin Oetzmann, Sara Simblett, Erin Dawe-Lane, Stuart Bruce, Daniel Stahl, Yatharth Ranjan, Zulqarnain Rashid, Pauline Conde, Amos A. Folarin, Josep Maria Haro, Til Wykes, Richard J. B. Dobson, Vaibhav A. Narayan, Matthew Hotopf, Björn W. Schuller, Nicholas Cummins, RADAR-CNS Consortium, 
Remote Smartphone-Based Speech Collection: Acceptance and Barriers in Individuals with Major Depressive Disorder.

ICASSP2020 Ziping Zhao 0001, Zhongtian Bao, Zixing Zhang 0001, Nicholas Cummins, Haishuai Wang, Björn W. Schuller, 
Hierarchical Attention Transfer Networks for Depression Assessment from Speech.

Interspeech2020 Merlin Albes, Zhao Ren, Björn W. Schuller, Nicholas Cummins
Squeeze for Sneeze: Compact Neural Networks for Cold and Flu Recognition.

Interspeech2020 Alice Baird, Nicholas Cummins, Sebastian Schnieder, Jarek Krajewski, Björn W. Schuller, 
An Evaluation of the Effect of Anxiety on Speech - Computational Prediction of Anxiety from Sustained Vowels.

Interspeech2020 Nicholas Cummins, Yilin Pan, Zhao Ren, Julian Fritsch, Venkata Srikanth Nallanthighal, Heidi Christensen, Daniel Blackburn, Björn W. Schuller, Mathew Magimai-Doss, Helmer Strik, Aki Härmä, 
A Comparison of Acoustic and Linguistics Methodologies for Alzheimer's Dementia Recognition.

Interspeech2020 Adria Mallol-Ragolta, Nicholas Cummins, Björn W. Schuller, 
An Investigation of Cross-Cultural Semi-Supervised Learning for Continuous Affect Recognition.

Interspeech2020 Zhao Ren, Jing Han 0010, Nicholas Cummins, Björn W. Schuller, 
Enhancing Transferability of Black-Box Adversarial Attacks via Lifelong Learning for Speech Emotion Recognition Models.

Interspeech2020 Ziping Zhao 0001, Qifei Li, Nicholas Cummins, Bin Liu 0041, Haishuai Wang, Jianhua Tao, Björn W. Schuller, 
Hybrid Network Feature Extraction for Depression Assessment from Speech.

Interspeech2019 Alice Baird, Shahin Amiriparian, Nicholas Cummins, Sarah Sturmbauer, Johanna Janson, Eva-Maria Meßner, Harald Baumeister, Nicolas Rohleder, Björn W. Schuller, 
Using Speech to Predict Sequentially Measured Cortisol Levels During a Trier Social Stress Test.

Interspeech2019 Adria Mallol-Ragolta, Ziping Zhao 0001, Lukas Stappen, Nicholas Cummins, Björn W. Schuller, 
A Hierarchical Attention Network-Based Approach for Depression Detection from Transcribed Clinical Interviews.

Interspeech2019 Maximilian Schmitt, Nicholas Cummins, Björn W. Schuller, 
Continuous Emotion Recognition in Speech - Do We Need Recurrence?

Interspeech2019 Xinzhou Xu, Jun Deng, Nicholas Cummins, Zixing Zhang 0001, Li Zhao 0003, Björn W. Schuller, 
Autonomous Emotion Learning in Speech: A View of Zero-Shot Speech Emotion Recognition.

Interspeech2019 Ziping Zhao 0001, Zhongtian Bao, Zixing Zhang 0001, Nicholas Cummins, Haishuai Wang, Björn W. Schuller, 
Attention-Enhanced Connectionist Temporal Classification for Discrete Speech Emotion Recognition.

Interspeech2018 Shahin Amiriparian, Alice Baird, Sahib Julka, Alyssa Alcorn, Sandra Ottl, Suncica Petrovic, Eloise Ainger, Nicholas Cummins, Björn W. Schuller, 
Recognition of Echolalic Autistic Child Vocalisations Utilising Convolutional Recurrent Neural Networks.

Interspeech2018 Alice Baird, Emilia Parada-Cabaleiro, Simone Hantke, Felix Burkhardt, Nicholas Cummins, Björn W. Schuller, 
The Perception and Analysis of the Likeability and Human Likeness of Synthesized Speech.

Interspeech2018 Eva-Maria Rathner, Julia Djamali, Yannik Terhorst, Björn W. Schuller, Nicholas Cummins, Gudrun Salamon, Christina Hunger-Schoppe, Harald Baumeister, 
How Did You like 2017? Detection of Language Markers of Depression and Narcissism in Personal Narratives.

#116  | Emmanuel Vincent 0001 | Google Scholar   DBLP
VenuesInterspeech: 16ICASSP: 8TASLP: 6SpeechComm: 1
Years2022: 32021: 12020: 112019: 52018: 42017: 22016: 5
ISCA Sectionvoice privacy challenge: 3special session: 2trustworthy speech processing: 1multi-channel speech enhancement and hearing aids: 1diarization: 1speech processing and analysis: 1monaural source separation: 1asr model training and strategies: 1acoustic model adaptation for asr: 1speech enhancement: 1privacy in speech and audio interfaces: 1robust speech recognition: 1spatial and phase cues for source separation and speech recognition: 1
IEEE Keywordspeaker recognition: 4speech recognition: 4source separation: 4audio signal processing: 4speech enhancement: 3voice conversion: 2privacy: 2data privacy: 2pattern classification: 2signal classification: 2speech separation: 2speaker verification: 2weak labels: 2fusion: 2bayes methods: 2audio source separation: 2linkability: 1natural language processing: 1speaker anonymization: 1image classification: 1adversarial domain adaptation: 1acoustic scene classification: 1feature normalization: 1unsupervised learning: 1knowledge based systems: 1moment matching: 1iterative methods: 1localization: 1deflation: 1linkage attack: 1triplet loss: 1prototypical network: 1audio embedding.: 1audio tagging: 1signal representation: 1sound event detection: 1pattern recognition: 1acoustic signal detection: 1confidence intervals: 1jackknife estimates: 1i vector: 1data distortion: 1uncertainty propagation: 1robustness: 1reverberation: 1echo: 1residual echo suppression: 1filtering theory: 1acoustic echo cancellation: 1neural network: 1echo suppression: 1direction of arrival estimation: 1recurrent neural nets: 1array signal processing: 1high order ambisonics (hoa): 1lstm: 1spatial filters: 1multichannel filtering: 1viterbi detection: 1mirex evaluation campaign: 1multi criteria approach: 1structural segmentation: 1music structure estimation: 1viterbi algorithm: 1information retrieval: 1optimization: 1music: 1regularity constraint: 1feature simulation: 1asr: 1chime: 1data augmentation: 1error statistics: 1dnn: 1gradient methods: 1maximum likelihood estimation: 1estimation theory: 1variational bayes, model averaging: 1ensemble: 1singing voice extraction: 1deep neural networks: 1aggregation: 1non negative matrix factorization: 1expectation maximization (em): 1deep neural network (dnn): 1wiener filters: 1multichannel: 1expectation maximisation algorithm: 1
Most Publications2020: 232019: 192022: 182017: 172010: 17

Affiliations
Inria Nancy - Grand Est, Villers-lès-Nancy, France

TASLP2022 Brij Mohan Lal Srivastava, Mohamed Maouche, Md. Sahidullah, Emmanuel Vincent 0001, Aurélien Bellet, Marc Tommasi, Natalia A. Tomashenko, Xin Wang 0037, Junichi Yamagishi, 
Privacy and Utility of X-Vector Based Speaker Anonymization.

ICASSP2022 Michel Olvera, Emmanuel Vincent 0001, Gilles Gasso, 
On The Impact of Normalization Strategies in Unsupervised Adversarial Domain Adaptation for Acoustic Scene Classification.

Interspeech2022 Mohamed Maouche, Brij Mohan Lal Srivastava, Nathalie Vauquier, Aurélien Bellet, Marc Tommasi, Emmanuel Vincent 0001
Enhancing Speech Privacy with Slicing.

Interspeech2021 Sunit Sivasankaran, Emmanuel Vincent 0001, Dominique Fohr, 
Explaining Deep Learning Models for Speech Enhancement.

ICASSP2020 Sunit Sivasankaran, Emmanuel Vincent 0001, Dominique Fohr, 
SLOGD: Speaker Location Guided Deflation Approach to Speech Separation.

ICASSP2020 Brij Mohan Lal Srivastava, Nathalie Vauquier, Md. Sahidullah, Aurélien Bellet, Marc Tommasi, Emmanuel Vincent 0001
Evaluating Voice Conversion-Based Privacy Protection against Informed Attackers.

ICASSP2020 Nicolas Turpault, Romain Serizel, Emmanuel Vincent 0001
Limitations of Weak Labels for Embedding and Tagging.

Interspeech2020 Samuele Cornell, Maurizio Omologo, Stefano Squartini, Emmanuel Vincent 0001
Detecting and Counting Overlapping Speakers in Distant Speech Scenarios.

Interspeech2020 Mathieu Hu, Laurent Pierron, Emmanuel Vincent 0001, Denis Jouvet, 
Kaldi-Web: An Installation-Free, On-Device Speech Recognition System.

Interspeech2020 Mohamed Maouche, Brij Mohan Lal Srivastava, Nathalie Vauquier, Aurélien Bellet, Marc Tommasi, Emmanuel Vincent 0001
A Comparative Study of Speech Anonymization Metrics.

Interspeech2020 Manuel Pariente, Samuele Cornell, Joris Cosentino, Sunit Sivasankaran, Efthymios Tzinis, Jens Heitkaemper, Michel Olvera, Fabian-Robert Stöter, Mathieu Hu, Juan M. Martín-Doñas, David Ditter, Ariel Frank, Antoine Deleforge, Emmanuel Vincent 0001
Asteroid: The PyTorch-Based Audio Source Separation Toolkit for Researchers.

Interspeech2020 Imran A. Sheikh, Emmanuel Vincent 0001, Irina Illina, 
On Semi-Supervised LF-MMI Training of Acoustic Models with Limited Data.

Interspeech2020 Brij Mohan Lal Srivastava, Natalia A. Tomashenko, Xin Wang 0037, Emmanuel Vincent 0001, Junichi Yamagishi, Mohamed Maouche, Aurélien Bellet, Marc Tommasi, 
Design Choices for X-Vector Based Speaker Anonymization.

Interspeech2020 Natalia A. Tomashenko, Brij Mohan Lal Srivastava, Xin Wang 0037, Emmanuel Vincent 0001, Andreas Nautsch, Junichi Yamagishi, Nicholas W. D. Evans, Jose Patino 0001, Jean-François Bonastre, Paul-Gauthier Noé, Massimiliano Todisco, 
Introducing the VoicePrivacy Initiative.

Interspeech2020 M. A. Tugtekin Turan, Emmanuel Vincent 0001, Denis Jouvet, 
Achieving Multi-Accent ASR via Unsupervised Acoustic Model Adaptation.

SpeechComm2019 Nancy Bertin, Ewen Camberlein, Romain Lebarbenchon, Emmanuel Vincent 0001, Sunit Sivasankaran, Irina Illina, Frédéric Bimbot, 
VoiceHome-2, an extended corpus for multichannel speech processing in real homes.

TASLP2019 Annamaria Mesaros, Aleksandr Diment, Benjamin Elizalde, Toni Heittola, Emmanuel Vincent 0001, Bhiksha Raj, Tuomas Virtanen, 
Sound Event Detection in the DCASE 2017 Challenge.

ICASSP2019 Dayana Ribas, Emmanuel Vincent 0001
An Improved Uncertainty Propagation Method for Robust I-vector Based Speaker Recognition.

Interspeech2019 Manuel Pariente, Antoine Deleforge, Emmanuel Vincent 0001
A Statistically Principled and Computationally Efficient Approach to Speech Enhancement Using Variational Autoencoders.

Interspeech2019 Brij Mohan Lal Srivastava, Aurélien Bellet, Marc Tommasi, Emmanuel Vincent 0001
Privacy-Preserving Adversarial Representation Learning in ASR: Reality or Illusion?

#117  | Chao Weng | Google Scholar   DBLP
VenuesICASSP: 18Interspeech: 13
Years2022: 72021: 102020: 82019: 42018: 2
ISCA Sectionsinging voice computing and processing in music: 2sequence models for asr: 2speaker embedding and diarization: 1acoustic event detection and classification: 1tools, corpora and resources: 1topics in asr: 1source separation, dereverberation and echo cancellation: 1asr model training and strategies: 1multi-channel speech enhancement: 1speech synthesis paradigms and methods: 1asr neural network training: 1
IEEE Keywordspeech recognition: 10speaker recognition: 9overlapped speech: 4end to end speech recognition: 4speech synthesis: 3natural language processing: 3speaker embedding: 3speaker diarization: 3voice activity detection: 3multi channel: 3speech enhancement: 3recurrent neural nets: 2text analysis: 2pattern clustering: 2multi look: 2direction of arrival estimation: 2source separation: 2unsupervised learning: 2automatic speech recognition: 2attention based model: 2graph neural network: 1conversational text to speech synthesis: 1speaking style: 1computational linguistics: 1code switched asr: 1bilingual asr: 1rnn t: 1overlap speech detection: 1inference mechanisms: 1speaker clustering: 1data handling: 1m2met: 1feature fusion: 1direction of arrival: 1synthetic speech detection: 1replay detection: 1res2net: 1multi scale feature: 1asv anti spoofing: 1uncertainty estimation: 1targetspeaker speech extraction: 1target speaker speech recognition: 1transformer: 1autoregressive processes: 1decoding: 1non autoregressive: 1ctc: 1source localization: 1microphone arrays: 1semi supervised learning: 1contrastive learning: 1data augmentation: 1self supervised learning: 1interference suppression: 1target speaker enhancement: 1robust speaker verification: 1speech separation: 1speaker verification: 1regression analysis: 1voice conversion: 1audio signal processing: 1singing synthesis: 1target speech extraction: 1signal reconstruction: 1minimisation: 1neural beamformer: 1self attention: 1persistent memory: 1dfsmn: 1cepstral mean normalization: 1channel normalization: 1robust automatic speech recognition: 1cepstral analysis: 1language model: 1code switching: 1
Most Publications2022: 222021: 212020: 162019: 142023: 8

Affiliations
URLs

ICASSP2022 Jingbei Li, Yi Meng, Chenyi Li, Zhiyong Wu 0001, Helen Meng, Chao Weng, Dan Su 0002, 
Enhancing Speaking Styles in Conversational Text-to-Speech Synthesis with Graph-Based Multi-Modal Context Modeling.

ICASSP2022 Brian Yan, Chunlei Zhang, Meng Yu 0003, Shi-Xiong Zhang, Siddharth Dalmia, Dan Berrebbi, Chao Weng, Shinji Watanabe 0001, Dong Yu 0001, 
Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization.

ICASSP2022 Chunlei Zhang, Jiatong Shi, Chao Weng, Meng Yu 0003, Dong Yu 0001, 
Towards end-to-end Speaker Diarization with Generalized Neural Speaker Clustering.

ICASSP2022 Naijun Zheng, Na Li 0012, Xixin Wu, Lingwei Meng, Jiawen Kang, Haibin Wu, Chao Weng, Dan Su 0002, Helen Meng, 
The CUHK-Tencent Speaker Diarization System for the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Challenge.

ICASSP2022 Naijun Zheng, Na Li 0012, Jianwei Yu, Chao Weng, Dan Su 0002, Xunying Liu, Helen Meng, 
Multi-Channel Speaker Diarization Using Spatial Features for Meetings.

Interspeech2022 Xiaoyi Qin, Na Li 0012, Chao Weng, Dan Su 0002, Ming Li 0026, 
Cross-Age Speaker Verification: Learning Age-Invariant Speaker Embeddings.

Interspeech2022 Helin Wang, Dongchao Yang, Chao Weng, Jianwei Yu, Yuexian Zou, 
Improving Target Sound Extraction with Timestamp Information.

ICASSP2021 Xu Li, Na Li 0012, Chao Weng, Xunying Liu, Dan Su 0002, Dong Yu 0001, Helen Meng, 
Replay and Synthetic Speech Detection with Res2Net Architecture.

ICASSP2021 Jiatong Shi, Chunlei Zhang, Chao Weng, Shinji Watanabe 0001, Meng Yu 0003, Dong Yu 0001, 
Improving RNN Transducer with Target Speaker Extraction and Neural Uncertainty Estimation.

ICASSP2021 Xingchen Song, Zhiyong Wu 0001, Yiheng Huang, Chao Weng, Dan Su 0002, Helen M. Meng, 
Non-Autoregressive Transformer ASR with CTC-Enhanced Decoder Input.

ICASSP2021 Aswin Shanmugam Subramanian, Chao Weng, Shinji Watanabe 0001, Meng Yu 0003, Yong Xu 0004, Shi-Xiong Zhang, Dong Yu 0001, 
Directional ASR: A New Paradigm for E2E Multi-Speaker Speech Recognition with Source Localization.

ICASSP2021 Wei Xia, Chunlei Zhang, Chao Weng, Meng Yu 0003, Dong Yu 0001, 
Self-Supervised Text-Independent Speaker Verification Using Prototypical Momentum Contrastive Learning.

ICASSP2021 Chunlei Zhang, Meng Yu 0003, Chao Weng, Dong Yu 0001, 
Towards Robust Speaker Verification with Target Speaker Enhancement.

ICASSP2021 Naijun Zheng, Na Li 0012, Bo Wu, Meng Yu 0003, Jianwei Yu, Chao Weng, Dan Su 0002, Xunying Liu, Helen Meng, 
A Joint Training Framework of Multi-Look Separator and Speaker Embedding Extractor for Overlapped Speech.

Interspeech2021 Guoguo Chen, Shuzhou Chai, Guan-Bo Wang, Jiayu Du, Wei-Qiang Zhang, Chao Weng, Dan Su 0002, Daniel Povey, Jan Trmal, Junbo Zhang, Mingjie Jin, Sanjeev Khudanpur, Shinji Watanabe 0001, Shuaijiang Zhao, Wei Zou, Xiangang Li, Xuchen Yao, Yongqing Wang, Zhao You, Zhiyong Yan, 
GigaSpeech: An Evolving, Multi-Domain ASR Corpus with 10, 000 Hours of Transcribed Audio.

Interspeech2021 Max W. Y. Lam, Jun Wang 0091, Chao Weng, Dan Su 0002, Dong Yu 0001, 
Raw Waveform Encoder with Multi-Scale Globally Attentive Locally Recurrent Networks for End-to-End Speech Recognition.

Interspeech2021 Helin Wang, Bo Wu, Lianwu Chen, Meng Yu 0003, Jianwei Yu, Yong Xu 0004, Shi-Xiong Zhang, Chao Weng, Dan Su 0002, Dong Yu 0001, 
TeCANet: Temporal-Contextual Attention Network for Environment-Aware Speech Dereverberation.

ICASSP2020 Chengqi Deng, Chengzhu Yu, Heng Lu 0004, Chao Weng, Dong Yu 0001, 
Pitchnet: Unsupervised Singing Voice Conversion with Pitch Adversarial Network.

ICASSP2020 Aswin Shanmugam Subramanian, Chao Weng, Meng Yu 0003, Shi-Xiong Zhang, Yong Xu 0004, Shinji Watanabe 0001, Dong Yu 0001, 
Far-Field Location Guided Target Speech Extraction Using End-to-End Speech Recognition Objectives.

ICASSP2020 Zhao You, Dan Su 0002, Jie Chen 0057, Chao Weng, Dong Yu 0001, 
Dfsmn-San with Persistent Memory Model for Automatic Speech Recognition.

#118  | Shoukang Hu | Google Scholar   DBLP
VenuesInterspeech: 15ICASSP: 10TASLP: 6
Years2022: 72021: 92020: 32019: 102018: 2
ISCA Sectionspeech recognition of atypical speech: 2topics in asr: 2medical applications and visual asr: 2multi-, cross-lingual and other topics in asr: 1miscellaneous topics in speech, voice and hearing disorders: 1speech and language in health: 1zero, low-resource and multi-modal speech recognition: 1speech and speaker recognition: 1asr neural network architectures: 1lexicon and language model for speech recognition: 1novel neural network architectures for acoustic modelling: 1application of asr in medical practice: 1
IEEE Keywordspeech recognition: 14bayes methods: 7recurrent neural nets: 5neural architecture search: 4gaussian processes: 4optimisation: 4natural language processing: 4deep learning (artificial intelligence): 3bayesian learning: 3speaker recognition: 3language models: 3quantisation (signal): 3time delay neural network: 2emotion recognition: 2speech emotion recognition: 2variational inference: 2inference mechanisms: 2speaker adaptation: 2admm: 2transformer: 2gradient methods: 2quantization: 2neural net architecture: 1search problems: 1minimisation: 1uncertainty handling: 1model uncertainty: 1monte carlo methods: 1neural language models: 1uniform sampling: 1path dropout: 1domain adaptation: 1gaussian process: 1lf mmi: 1delays: 1generalisation (artificial intelligence): 1handicapped aids: 1data augmentation: 1multimodal speech recognition: 1disordered speech recognition: 1low bit quantization: 1lstm rnn: 1multi channel: 1speech separation: 1audio visual: 1filtering theory: 1jointly fine tuning: 1microphone arrays: 1visual occlusion: 1overlapped speech recognition: 1image recognition: 1video signal processing: 1elderly speech: 1automatic speech recognition: 1neurocognitive disorder detection: 1dementia: 1data compression: 1recurrent neural networks: 1alternating direction methods of multipliers: 1activation function selection: 1gaussian process neural network: 1bayesian neural network: 1lstm: 1neural network language models: 1parameter estimation: 1capsule networks: 1convolutional neural nets: 1spatial relationship information: 1recurrent connection: 1utterance level features: 1maximum likelihood estimation: 1hidden markov models: 1lhuc: 1entropy: 1rnnlms: 1natural gradient: 1
Most Publications2022: 232021: 162019: 102020: 92023: 3

Affiliations
URLs

TASLP2022 Shoukang Hu, Xurong Xie, Mingyu Cui, Jiajun Deng, Shansong Liu, Jianwei Yu, Mengzhe Geng, Xunying Liu, Helen Meng, 
Neural Architecture Search for LF-MMI Trained Time Delay Neural Networks.

TASLP2022 Boyang Xue, Shoukang Hu, Junhao Xu, Mengzhe Geng, Xunying Liu, Helen Meng, 
Bayesian Neural Network Language Modeling for Speech Recognition.

ICASSP2022 Xixin Wu, Shoukang Hu, Zhiyong Wu 0001, Xunying Liu, Helen Meng, 
Neural Architecture Search for Speech Emotion Recognition.

Interspeech2022 Mingyu Cui, Jiajun Deng, Shoukang Hu, Xurong Xie, Tianzi Wang, Shujie Hu, Mengzhe Geng, Boyang Xue, Xunying Liu, Helen Meng, 
Two-pass Decoding and Cross-adaptation Based System Combination of End-to-end Conformer and Hybrid TDNN ASR Systems.

Interspeech2022 Tianzi Wang, Jiajun Deng, Mengzhe Geng, Zi Ye, Shoukang Hu, Yi Wang, Mingyu Cui, Zengrui Jin, Xunying Liu, Helen Meng, 
Conformer Based Elderly Speech Recognition System for Alzheimer's Disease Detection.

Interspeech2022 Yi Wang, Tianzi Wang, Zi Ye, Lingwei Meng, Shoukang Hu, Xixin Wu, Xunying Liu, Helen Meng, 
Exploring linguistic feature and model combination for speech recognition based automatic AD detection.

Interspeech2022 Junhao Xu, Shoukang Hu, Xunying Liu, Helen Meng, 
Towards Green ASR: Lossless 4-bit Quantization of a Hybrid TDNN System on the 300-hr Swithboard Corpus.

TASLP2021 Shoukang Hu, Xurong Xie, Shansong Liu, Jianwei Yu, Zi Ye, Mengzhe Geng, Xunying Liu, Helen Meng, 
Bayesian Learning of LF-MMI Trained Time Delay Neural Networks for Speech Recognition.

TASLP2021 Shansong Liu, Mengzhe Geng, Shoukang Hu, Xurong Xie, Mingyu Cui, Jianwei Yu, Xunying Liu, Helen Meng, 
Recent Progress in the CUHK Dysarthric Speech Recognition System.

TASLP2021 Junhao Xu, Jianwei Yu, Shoukang Hu, Xunying Liu, Helen Meng, 
Mixed Precision Low-Bit Quantization of Neural Network Language Models for Speech Recognition.

TASLP2021 Jianwei Yu, Shi-Xiong Zhang, Bo Wu, Shansong Liu, Shoukang Hu, Mengzhe Geng, Xunying Liu, Helen Meng, Dong Yu 0001, 
Audio-Visual Multi-Channel Integration and Recognition of Overlapped Speech.

ICASSP2021 Shoukang Hu, Xurong Xie, Shansong Liu, Mingyu Cui, Mengzhe Geng, Xunying Liu, Helen Meng, 
Neural Architecture Search for LF-MMI Trained Time Delay Neural Networks.

ICASSP2021 Junhao Xu, Shoukang Hu, Jianwei Yu, Xunying Liu, Helen Meng, 
Mixed Precision Quantization of Transformer Language Models for Speech Recognition.

ICASSP2021 Zi Ye, Shoukang Hu, Jinchao Li, Xurong Xie, Mengzhe Geng, Jianwei Yu, Junhao Xu, Boyang Xue, Shansong Liu, Xunying Liu, Helen Meng, 
Development of the Cuhk Elderly Speech Recognition System for Neurocognitive Disorder Detection Using the Dementiabank Corpus.

Interspeech2021 Jiajun Deng, Fabian Ritter Gutierrez, Shoukang Hu, Mengzhe Geng, Xurong Xie, Zi Ye, Shansong Liu, Jianwei Yu, Xunying Liu, Helen Meng, 
Bayesian Parametric and Architectural Domain Adaptation of LF-MMI Trained TDNNs for Elderly and Dysarthric Speech Recognition.

Interspeech2021 Mengzhe Geng, Shansong Liu, Jianwei Yu, Xurong Xie, Shoukang Hu, Zi Ye, Zengrui Jin, Xunying Liu, Helen Meng, 
Spectro-Temporal Deep Features for Disordered Speech Assessment and Recognition.

ICASSP2020 Junhao Xu, Xie Chen 0001, Shoukang Hu, Jianwei Yu, Xunying Liu, Helen Meng, 
Low-bit Quantization of Recurrent Neural Network Language Models Using Alternating Direction Methods of Multipliers.

Interspeech2020 Mengzhe Geng, Xurong Xie, Shansong Liu, Jianwei Yu, Shoukang Hu, Xunying Liu, Helen Meng, 
Investigation of Data Augmentation Techniques for Disordered Speech Recognition.

Interspeech2020 Shansong Liu, Xurong Xie, Jianwei Yu, Shoukang Hu, Mengzhe Geng, Rongfeng Su, Shi-Xiong Zhang, Xunying Liu, Helen Meng, 
Exploiting Cross-Domain Visual Feature Generation for Disordered Speech Recognition.

ICASSP2019 Shoukang Hu, Max W. Y. Lam, Xurong Xie, Shansong Liu, Jianwei Yu, Xixin Wu, Xunying Liu, Helen Meng, 
Bayesian and Gaussian Process Neural Networks for Large Vocabulary Continuous Speech Recognition.

#119  | Hema A. Murthy | Google Scholar   DBLP
VenuesInterspeech: 20ICASSP: 5SpeechComm: 4TASLP: 1
Years2021: 22020: 62019: 32018: 92017: 62016: 4
ISCA Sectionshow and tell: 3speech synthesis: 2cross/multi-lingual and code-switched asr: 1speech synthesis paradigms and methods: 1the zero resource speech challenge 2020: 1neural signals for spoken communication: 1the zero resource speech challenge 2019: 1syllabification, rhythm, and voice activity detection: 1speech segments and voice quality: 1low resource speech recognition challenge for indian languages: 1spoofing detection: 1speaker state and trait: 1speech technologies for code-switching in multilingual communities: 1speech and audio segmentation and classification: 1show & tell: 1cognition and brain studies: 1speaker diarization and recognition: 1
IEEE Keywordspeaker diarization: 2information bottleneck: 2speaker recognition: 2varying length segment: 1two pass system: 1pattern clustering: 1phoneme rate: 1feedforward neural nets: 1speaker discriminative features: 1cpn sta model: 1music: 1carnatic music: 1viterbi algorithm: 1maximum likelihood estimation: 1descriptive transcription: 1transfer learning: 1electroencephalography: 1delta: 1multitaper: 1syllable: 1speech: 1bioelectric potentials: 1eeg: 1brain computer interfaces: 1speech recognition: 1neurophysiology: 1medical signal processing: 1calcium: 1group delay: 1fluorescence: 1spike train: 1ca2+ fluorescence: 1hidden markov models: 1speech synthesis: 1pseudo syllables: 1tts: 1indian english: 1
Most Publications2020: 192018: 172017: 152019: 122013: 12

Affiliations
URLs

TASLP2021 Nauman Dawalatabad, Srikanth R. Madikeri, C. Chandra Sekhar, Hema A. Murthy
Novel Architectures for Unsupervised Information Bottleneck Based Speaker Diarization of Meetings.

Interspeech2021 Mari Ganesh Kumar, Jom Kuriakose, Anand Thyagachandran, Arun Kumar A, Ashish Seth, Lodagala Durga Prasad, Saish Jaiswal, Anusha Prakash 0001, Hema A. Murthy
Dual Script E2E Framework for Multilingual and Code-Switching ASR.

SpeechComm2020 Arun Baby, Jeena J. Prakash, Aswin Shanmugam Subramanian, Hema A. Murthy
Significance of spectral cues in automatic speech segmentation for Indian language speech synthesizers.

ICASSP2020 Venkata Subramanian Viraraghavan, Arpan Pal 0001, Hema A. Murthy, Rangarajan Aravind, 
State-Based Transcription of Components of Carnatic Music.

Interspeech2020 Mano Ranjith Kumar M., Sudhanshu Srivastava, Anusha Prakash 0001, Hema A. Murthy
A Hybrid HMM-Waveglow Based Text-to-Speech Synthesizer Using Histogram Equalization for Low Resource Indian Languages.

Interspeech2020 Anusha Prakash 0001, Hema A. Murthy
Generic Indic Text-to-Speech Synthesisers with Rapid Adaptation in an End-to-End Framework.

Interspeech2020 Karthik Pandia D. S, Anusha Prakash 0001, Mano Ranjith Kumar M., Hema A. Murthy
Exploration of End-to-End Synthesisers for Zero Resource Speech Challenge 2020.

Interspeech2020 Rini A. Sharon, Hema A. Murthy
The "Sound of Silence" in EEG - Cognitive Voice Activity Detection.

ICASSP2019 Nauman Dawalatabad, Srikanth R. Madikeri, C. Chandra Sekhar, Hema A. Murthy
Incremental Transfer Learning in Two-pass Information Bottleneck Based Speaker Diarization System for Meetings.

ICASSP2019 Rini A. Sharon, Shrikanth S. Narayanan, Mriganka Sur, Hema A. Murthy
An Empirical Study of Speech Processing in the Brain by Analyzing the Temporal Syllable Structure in Speech-input Induced EEG.

Interspeech2019 Karthik Pandia D. S, Hema A. Murthy
Zero Resource Speech Synthesis Using Transcripts Derived from Perceptual Acoustic Units.

Interspeech2018 Nauman Dawalatabad, Jom Kuriakose, Chellu Chandra Sekhar, Hema A. Murthy
Information Bottleneck Based Percussion Instrument Diarization System for Taniavartanam Segments of Carnatic Music Concerts.

Interspeech2018 Gayathri G, N. Mohana, Radhika Pal, Hema A. Murthy
Mobile Application for Learning Languages for the Unlettered.

Interspeech2018 G. R. Kasthuri, Prabha Ramanathan, Hema A. Murthy, Namita Jacob, Anil Prabhakar, 
Early Vocabulary Development Through Picture-based Software Solutions.

Interspeech2018 Mahesh M, Jeena J. Prakash, Hema A. Murthy
Resyllabification in Indian Languages and Its Implications in Text-to-speech Systems.

Interspeech2018 Srihari Maruthachalam, Sidharth Aggarwal, Mari Ganesh Kumar, Mriganka Sur, Hema A. Murthy
Brain-Computer Interface using Electroencephalogram Signatures of Eye Blinks.

Interspeech2018 Jeena J. Prakash, Rajan Golda Brunet, Hema A. Murthy
Transcription Correction for Indian Languages Using Acoustic Signatures.

Interspeech2018 M. S. Saranya, Hema A. Murthy
Decision-level Feature Switching as a Paradigm for Replay Attack Detection.

Interspeech2018 Jilt Sebastian, Manoj Kumar 0007, Pavan Kumar D. S., Mathew Magimai-Doss, Hema A. Murthy, Shrikanth S. Narayanan, 
Denoising and Raw-waveform Networks for Weakly-Supervised Gender Identification on Noisy Speech.

Interspeech2018 Anju Leela Thomas, Anusha Prakash 0001, Arun Baby, Hema A. Murthy
Code-switching in Indic Speech Synthesisers.

#120  | Sharon Gannot | Google Scholar   DBLP
VenuesTASLP: 18ICASSP: 9Interspeech: 2SpeechComm: 1
Years2022: 12021: 32020: 92019: 32018: 52017: 32016: 6
ISCA Sectionsource separation: 1source separation, dereverberation and echo cancellation: 1
IEEE Keywordreverberation: 18array signal processing: 8gaussian processes: 8microphones: 8speech enhancement: 8audio signal processing: 5fourier transforms: 5direction of arrival estimation: 5transfer functions: 5bayes methods: 5filtering theory: 5source separation: 5wiener filters: 5microphone arrays: 4array processing: 4blind source separation: 4maximum likelihood estimation: 4dereverberation: 4speaker recognition: 4kalman filters: 4covariance matrices: 3relative transfer function (rtf): 3mean square error methods: 3signal denoising: 3probability: 3kalman smoother: 3variational em: 3expectation maximisation algorithm: 3audio source separation: 3estimation theory: 2relative harmonic coefficients: 2signal representation: 2time frequency analysis: 2matrix algebra: 2noise reduction: 2speaker tracking: 2adaptive beamforming: 2convolutive transfer function: 2approximation theory: 2spectral analysis: 2gaussian process: 2matrix decomposition: 2sound source localization: 2multiple speaker doa estimations: 1reverberant environments: 1decoupled doa estimation: 1computationally efficient: 1computational complexity: 1direct path rhcs: 1loudspeakers: 1near field communication: 1beamforming: 1white noise: 1near field: 1superdirectivity: 1white noise gain: 1radio receivers: 1polynomials: 1relative transfer function: 1system identification: 1oblique projection: 1source feature estimator: 1gaussian process regression: 1semi supervised multiple source localization: 1multi mode gaussian process: 1simplex: 1blind audio source separation (bass): 1eigenvalues and eigenfunctions: 1beamformer: 1spectral mask: 1cramér rao bound (crb): 1maximum likelihood estimation (mle): 1factor graphs: 1loopy belief propagation (lbp): 1inference mechanisms: 1graph theory: 1speaker separation: 1belief networks: 1unsupervised learning: 1unsupervised multiple source localization: 1single source frame/bin detector.: 1signal detection: 1image reconstruction: 1clustering: 1image representation: 1autoencoders: 1deep networks: 1pattern clustering: 1deep neural network: 1narrowband noise: 1single microphone: 1minimisation: 1mint: 1short time fourier transform: 1lasso optimization: 1number of speakers estimation: 1acoustic source localization: 1em algorithm: 1geometry: 1diarization: 1diffuse sound: 1power spectral density estimation: 1time difference of arrival (tdoa): 1extended kalman filter (ekf): 1time of arrival estimation: 1nonlinear filters: 1signal sampling: 1convolution: 1equalisers: 1transient response: 1multichannel identification: 1multichannel equalization: 1time domain analysis: 1frequency response: 1source localization: 1supervised learning: 1unidirectional microphones: 1channel estimation: 1time varying channels: 1diffuse noise: 1expectation maximization: 1hidden markov models: 1local gaussian model: 1speaker diarisation: 1mixture models: 1mixmax model: 1neural network: 1phoneme classification: 1time varying filters: 1moving sources: 1stochastic processes: 1time varying mixing filters: 1optimisation: 1reproducing kernel hilbert space (rkhs): 1manifold regularization: 1diffusion distance: 1direct path relative transfer function: 1acoustic noise: 1inter frame spectral subtraction: 1regional statistics: 1noise psd: 1speech presence probability: 1
Most Publications2016: 292015: 272020: 262018: 262017: 25


TASLP2022 Yonggang Hu, Prasanga N. Samarasinghe, Sharon Gannot, Thushara D. Abhayapala, 
Decoupled Multiple Speaker Direction-of-Arrival Estimator Under Reverberant Environments.

TASLP2021 Dovid Y. Levin, Shmulik Markovich-Golan, Sharon Gannot
Near-Field Superdirectivity: An Analytical Perspective.

Interspeech2021 Aviad Eisenberg, Boaz Schwartz, Sharon Gannot
Online Blind Audio Source Separation Using Recursive Expectation-Maximization.

Interspeech2021 Yochai Yemini, Ethan Fetaya, Haggai Maron, Sharon Gannot
Scene-Agnostic Multi-Microphone Speech Dereverberation.

TASLP2020 Dani Cherkassky, Sharon Gannot
Successive Relative Transfer Function Identification Using Blind Oblique Projection.

TASLP2020 Yonggang Hu, Prasanga N. Samarasinghe, Sharon Gannot, Thushara D. Abhayapala, 
Semi-Supervised Multiple Source Localization Using Relative Harmonic Coefficients Under Noisy and Reverberant Environments.

TASLP2020 Bracha Laufer-Goldshtein, Ronen Talmon, Sharon Gannot
Global and Local Simplex Representations for Multichannel Source Separation.

TASLP2020 Yaron Laufer, Bracha Laufer-Goldshtein, Sharon Gannot
ML Estimation and CRBs for Reverberation, Speech, and Noise PSDs in Rank-Deficient Noise Field.

TASLP2020 Koby Weisberg, Bracha Laufer-Goldshtein, Sharon Gannot
Simultaneous Tracking and Separation of Multiple Sources Using Factor Graph Model.

ICASSP2020 Elior Hadad, Sharon Gannot
Maximum Likelihood Multi-Speaker Direction of Arrival Estimation Utilizing a Weighted Histogram.

ICASSP2020 Yonggang Hu, Prasanga N. Samarasinghe, Thushara D. Abhayapala, Sharon Gannot
Unsupervised Multiple Source Localization Using Relative Harmonic Coefficients.

ICASSP2020 Yaniv Opochinsky, Shlomo E. Chazan, Sharon Gannot, Jacob Goldberger, 
K-Autoencoders Deep Clustering.

ICASSP2020 Yochai Yemini, Shlomo E. Chazan, Jacob Goldberger, Sharon Gannot
A Composite DNN Architecture for Speech Enhancement.

TASLP2019 Yaron Laufer, Sharon Gannot
A Bayesian Hierarchical Model for Speech Enhancement With Time-Varying Audio Channel.

TASLP2019 Xiaofei Li 0001, Laurent Girin, Sharon Gannot, Radu Horaud, 
Multichannel Speech Separation and Enhancement Using the Convolutive Transfer Function.

ICASSP2019 Andreas Brendel, Bracha Laufer-Goldshtein, Sharon Gannot, Ronen Talmon, Walter Kellermann, 
Localization of an Unknown Number of Speakers in Adverse Acoustic Conditions Using Reliability Information and Diarization.

TASLP2018 Sebastian Braun, Adam Kuklasinski, Ofer Schwartz, Oliver Thiergart, Emanuël A. P. Habets, Sharon Gannot, Simon Doclo, Jesper Jensen 0001, 
Evaluation and Comparison of Late Reverberation Power Spectral Density Estimators.

TASLP2018 Bracha Laufer-Goldshtein, Ronen Talmon, Sharon Gannot
A Hybrid Approach for Speaker Tracking Based on TDOA and Data-Driven Models.

TASLP2018 Xiaofei Li 0001, Sharon Gannot, Laurent Girin, Radu Horaud, 
Multichannel Identification and Nonnegative Equalization for Dereverberation and Noise Reduction Based on Convolutive Transfer Function.

ICASSP2018 Bracha Laufer-Goldshtein, Ronen Talmon, Israel Cohen, Sharon Gannot
Multi-View Source Localization Based on Power Ratios.

#121  | Yannis Stylianou | Google Scholar   DBLP
VenuesInterspeech: 16ICASSP: 11TASLP: 2SpeechComm: 1
Years2022: 22021: 12020: 22019: 42018: 52017: 62016: 10
ISCA Sectionspeech synthesis: 3speech enhancement: 3speech coding and audio processing for noise reduction: 2intelligibility-enhancing speech modification: 1neural techniques for voice conversion and waveform generation: 1robust speech recognition: 1multimodal dialogue systems: 1speech intelligibility and quality: 1noise robust and far-field asr: 1speech-enhancement: 1speech quality & intelligibility: 1
IEEE Keywordspeech synthesis: 4speech enhancement: 3speech intelligibility: 2reverberation: 2speech recognition: 2spoken dialogue management: 2interactive systems: 2signal classification: 2support vector machines: 2hidden markov models: 2signal representation: 2speech analysis: 2near end listening enhancement: 1greek harvard corpus: 1hearing aids: 1noisy speech: 1noise: 1backpropagation: 1automatic speech recognition: 1neural network: 1unsupervised learning: 1natural language interfaces: 1dialogue policy: 1ontologies (artificial intelligence): 1multiple domains: 1human computer interaction: 1policy network: 1expression transplantation: 1expressive speech: 1expressive speaker adaptation: 1speaker recognition: 1dnn: 1emotion recognition: 1random forests: 1speech features: 1audio signal processing: 1ensemble classifiers: 1random processes: 1regression analysis: 1gaussian processes: 1user satisfaction: 1dialogue quality: 1acoustic features: 1face recognition: 1expression adaptation: 1deep neural network: 1pattern clustering: 1expressive visual text to speech: 1power dynamics recovery: 1adaptive control: 1time warp: 1speech modification: 1gain control: 1mean square error methods: 1dynamic range compression: 1public address system: 1spectral shaping: 1speech intelligibility enhancement: 1phase coding: 1matrix algebra: 1complex valued neural network: 1complex amplitude: 1phase modelling: 1unvoiced speech: 1signal reconstruction: 1time frequency analysis: 1extended adaptive quasi harmonic model: 1transforms: 1sinusoidal model: 1iterative methods: 1speech representation: 1cepstral analysis: 1phase estimation: 1over smoothing: 1spectral analysis: 1decision trees: 1factorised speech representation: 1hmm based speech synthesis: 1sub band: 1
Most Publications2014: 182016: 152013: 142009: 132018: 12

Affiliations
URLs

TASLP2022 P. V. Muhammed Shifas, Catalin Zorila, Yannis Stylianou
End-to-End Neural Based Modification of Noisy Speech for Speech-in-Noise Intelligibility Improvement.

Interspeech2022 Tuomo Raitio, Petko Petkov, Jiangchuan Li, P. V. Muhammed Shifas, Andrea Davis, Yannis Stylianou
Vocal effort modeling in neural TTS for improving the intelligibility of synthetic speech in noise.

Interspeech2021 Dipjyoti Paul, Sankar Mukherjee, Yannis Pantazis, Yannis Stylianou
A Universal Multi-Speaker Multi-Style Text-to-Speech via Disentangled Representation Learning Based on Rényi Divergence Minimization.

Interspeech2020 Dipjyoti Paul, Yannis Pantazis, Yannis Stylianou
Speaker Conditional WaveRNN: Towards Universal Neural Vocoder for Unseen Speaker and Recording Conditions.

Interspeech2020 Dipjyoti Paul, P. V. Muhammed Shifas, Yannis Pantazis, Yannis Stylianou
Enhancing Speech Intelligibility in Text-To-Speech Synthesis Using Speaking Style Conversion.

ICASSP2019 Petko Nikolov Petkov, Vasileios Tsiaras, Rama Doddipatla, Yannis Stylianou
An Unsupervised Learning Approach to Neural-net-supported Wpe Dereverberation.

Interspeech2019 Nagaraj Adiga, Yannis Pantazis, Vassilis Tsiaras, Yannis Stylianou
Speech Enhancement for Noise-Robust Speech Synthesis Using Wasserstein GAN.

Interspeech2019 Dipjyoti Paul, Yannis Pantazis, Yannis Stylianou
Non-Parallel Voice Conversion Using Weighted Generative Adversarial Networks.

Interspeech2019 P. V. Muhammed Shifas, Nagaraj Adiga, Vassilis Tsiaras, Yannis Stylianou
A Non-Causal FFTNet Architecture for Speech Enhancement.

ICASSP2018 Alexandros Papangelis, Margarita Kotti, Yannis Stylianou
Towards Scalable Information-Seeking Multi-Domain Dialogue.

ICASSP2018 Jonathan Parker, Yannis Stylianou, Roberto Cipolla, 
Adaptation of an Expressive Single Speaker Deep Neural Network Speech Synthesis System.

Interspeech2018 Cong-Thanh Do, Yannis Stylianou
Weighting Time-Frequency Representation of Speech Using Auditory Saliency for Automatic Speech Recognition.

Interspeech2018 Margarita Kotti, Vassilios Diakoloukas, Alexandros Papangelis, Michail Lagoudakis, Yannis Stylianou
A Case Study on the Importance of Belief State Representation for Dialogue Policy Management.

Interspeech2018 P. V. Muhammed Shifas, Vassilis Tsiaras, Yannis Stylianou
Speech Intelligibility Enhancement Based on a Non-causal Wavenet-like Model.

ICASSP2017 Margarita Kotti, Yannis Stylianou
Effective emotion recognition in movie audio tracks.

ICASSP2017 Alexandros Papangelis, Margarita Kotti, Yannis Stylianou
Predicting dialogue success, naturalness, and length with acoustic features.

ICASSP2017 Jonathan Parker, Ranniery Maia, Yannis Stylianou, Roberto Cipolla, 
Expressive visual text to speech and expression adaptation using deep neural networks.

ICASSP2017 Petko Nikolov Petkov, Yannis Stylianou
Adaptive gain control and time warp for enhanced speech intelligibility under reverberation.

Interspeech2017 Cong-Thanh Do, Yannis Stylianou
Improved Automatic Speech Recognition Using Subband Temporal Envelope Features and Time-Delay Neural Network Denoising Autoencoder.

Interspeech2017 Tudor-Catalin Zorila, Yannis Stylianou
On the Quality and Intelligibility of Noisy Speech Processed for Near-End Listening Enhancement.

#122  | Hirokazu Kameoka | Google Scholar   DBLP
VenuesInterspeech: 16ICASSP: 10TASLP: 4
Years2022: 42021: 32020: 42019: 52018: 12017: 92016: 4
ISCA Sectionspeech synthesis: 2voice conversion and adaptation: 2speech synthesis prosody: 2voice conversion: 2co-inference of production and acoustics: 2speech synthesis paradigms and methods: 1neural techniques for voice conversion and waveform generation: 1speech and audio source separation and scene analysis: 1speech-enhancement: 1wavenet and novel paradigms: 1speech enhancement and noise reduction: 1
IEEE Keywordvoice conversion (vc): 5speech synthesis: 4generative adversarial networks (gans): 3non parallel vc: 3speech coding: 2convolutional neural nets: 2vocoders: 2audio signal processing: 2source separation: 2recurrent neural nets: 2sequence to sequence: 2voice conversion: 2speech recognition: 2cyclegan vc: 2cyclegan: 2speech enhancement: 2iterative methods: 2convolutional neural network: 1time frequency analysis: 1spectral analysis: 1waveform synthesis: 1inverse short time fourier transform: 1fourier transforms: 1inverse transforms: 1mel spectrogram vocoder: 1generative adversarial networks: 1frequency domain analysis: 1hungarian algorithm: 1multichannel source separation: 1block permutation: 1audio inpainting: 1transformer: 1pretraining: 1mel spectrogram conversion: 1stargan: 1a stargan: 1multi domain vc: 1nonparallel vc: 1emotion recognition: 1natural language processing: 1many to many vc: 1fully convolutional model: 1sequence to sequence learning: 1speaker recognition: 1attention: 1decoding: 1auxiliary classifier vae (acvae): 1variational autoencoder (vae): 1fully convolutional network: 1cepstral analysis: 1gaussian processes: 1mixture models: 1context preservation mechanism: 1attention mechanism: 1singing voice: 1convolution: 1voice contour: 1variational autoencoder: 1gated convolutional network: 1deep generative model: 1feedforward neural nets: 1complex nmf: 1minimisation: 1matrix decomposition: 1non negative matrix factorization (nmf): 1generalized kullback leibler (kl) divergence: 1audio source separation: 1prosodic information processing: 1voice fundamental frequency contour: 1em algorithm: 1fujisaki model: 1expectation maximisation algorithm: 1probability: 1microphones: 1tensors: 1noise suppression: 1non negative matrix factorization: 1nonaudible murmur: 1silent speech communication: 1external noise monitoring: 1noise: 1generative model: 1product of experts: 1f0 prediction: 1electrolaryngeal speech enhancement: 1
Most Publications2018: 202019: 192014: 182017: 172021: 13

Affiliations
URLs

ICASSP2022 Takuhiro Kaneko, Kou Tanaka, Hirokazu Kameoka, Shogo Seki, 
ISTFTNET: Fast and Lightweight Mel-Spectrogram Vocoder Incorporating Inverse Short-Time Fourier Transform.

ICASSP2022 Li Li 0063, Hirokazu Kameoka, Shogo Seki, 
HBP: An Efficient Block Permutation Solver Using Hungarian Algorithm and Spectrogram Inpainting for Multichannel Audio Source Separation.

Interspeech2022 Hirokazu Kameoka, Takuhiro Kaneko, Shogo Seki, Kou Tanaka, 
CAUSE: Crossmodal Action Unit Sequence Estimation from Speech.

Interspeech2022 Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Shogo Seki, 
MISRNet: Lightweight Neural Vocoder Using Multi-Input Single Shared Residual Blocks.

TASLP2021 Wen-Chin Huang, Tomoki Hayashi, Yi-Chiao Wu, Hirokazu Kameoka, Tomoki Toda, 
Pretraining Techniques for Sequence-to-Sequence Voice Conversion.

ICASSP2021 Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Nobukatsu Hojo, 
Maskcyclegan-VC: Learning Non-Parallel Voice Conversion with Filling in Frames.

Interspeech2021 Shoki Sakamoto, Akira Taniguchi, Tadahiro Taniguchi, Hirokazu Kameoka
StarGAN-VC+ASR: StarGAN-Based Non-Parallel Voice Conversion Regularized by Automatic Speech Recognition.

TASLP2020 Hirokazu Kameoka, Takuhiro Kaneko, Kou Tanaka, Nobukatsu Hojo, 
Nonparallel Voice Conversion With Augmented Classifier Star Generative Adversarial Networks.

TASLP2020 Hirokazu Kameoka, Kou Tanaka, Damian Kwasny, Takuhiro Kaneko, Nobukatsu Hojo, 
ConvS2S-VC: Fully Convolutional Sequence-to-Sequence Voice Conversion.

Interspeech2020 Wen-Chin Huang, Tomoki Hayashi, Yi-Chiao Wu, Hirokazu Kameoka, Tomoki Toda, 
Voice Transformer Network: Sequence-to-Sequence Voice Conversion Using Transformer with Text-to-Speech Pretraining.

Interspeech2020 Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Nobukatsu Hojo, 
CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-Spectrogram Conversion.

TASLP2019 Hirokazu Kameoka, Takuhiro Kaneko, Kou Tanaka, Nobukatsu Hojo, 
ACVAE-VC: Non-Parallel Voice Conversion With Auxiliary Classifier Variational Autoencoder.

ICASSP2019 Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Nobukatsu Hojo, 
Cyclegan-VC2: Improved Cyclegan-based Non-parallel Voice Conversion.

ICASSP2019 Kou Tanaka, Hirokazu Kameoka, Takuhiro Kaneko, Nobukatsu Hojo, 
ATTS2S-VC: Sequence-to-sequence Voice Conversion with Attention and Context Preservation Mechanisms.

Interspeech2019 Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Nobukatsu Hojo, 
StarGAN-VC2: Rethinking Conditional Methods for StarGAN-Based Voice Conversion.

Interspeech2019 Dongxiao Wang, Hirokazu Kameoka, Koichi Shinoda, 
A Modified Algorithm for Multiple Input Spectrogram Inversion.

ICASSP2018 Kou Tanaka, Hirokazu Kameoka, Kazuho Morikawa, 
Vae-Space: Deep Generative Model of Voice Fundamental Frequency Contours.

ICASSP2017 Hirokazu Kameoka, Hideaki Kagami, Masahiro Yukawa, 
Complex NMF with the generalized Kullback-Leibler divergence.

ICASSP2017 Ryotaro Sato, Hirokazu Kameoka, Kunio Kashino, 
Fast algorithm for statistical phrase/accent command estimation based on generative model incorporating spectral features.

ICASSP2017 Yusuke Tajiri, Hirokazu Kameoka, Tomoki Toda, 
A noise suppression method for body-conducted soft speech based on non-negative tensor factorization of air- and body-conducted signals.

#123  | Meng Yu 0003 | Google Scholar   DBLP
VenuesInterspeech: 17ICASSP: 11TASLP: 2
Years2022: 42021: 112020: 62019: 62018: 3
ISCA Sectionsource separation, dereverberation and echo cancellation: 2multi-channel speech enhancement: 2speech and audio source separation and scene analysis: 2asr for noisy and far-field speech: 2deep learning for source separation and pitch tracking: 2dereverberation and echo cancellation: 1source separation: 1speech localization, enhancement, and quality assessment: 1speech synthesis paradigms and methods: 1multimodal speech processing: 1speech enhancement: 1topics in speech recognition: 1
IEEE Keywordspeech recognition: 8speaker recognition: 7speech enhancement: 5speaker embedding: 3speech separation: 3source separation: 3reverberation: 2pattern clustering: 2end to end speech recognition: 2overlapped speech: 2application program interfaces: 1acoustic environment: 1speech simulation: 1graphics processing units: 1transient response: 1computational linguistics: 1code switched asr: 1bilingual asr: 1natural language processing: 1rnn t: 1speaker diarization: 1voice activity detection: 1overlap speech detection: 1inference mechanisms: 1speaker clustering: 1audio visual systems: 1audio visual processing: 1sensor fusion: 1speech synthesis: 1audio signal processing: 1sound source separation: 1adl mvdr: 1array signal processing: 1mvdr: 1recurrent neural nets: 1uncertainty estimation: 1targetspeaker speech extraction: 1target speaker speech recognition: 1source localization: 1microphone arrays: 1direction of arrival estimation: 1semi supervised learning: 1contrastive learning: 1data augmentation: 1text analysis: 1unsupervised learning: 1self supervised learning: 1interference suppression: 1target speaker enhancement: 1robust speaker verification: 1multi channel: 1multi look: 1speaker verification: 1multi channel speech separation: 1spatial features: 1end to end: 1spatial filters: 1filtering theory: 1inter channel convolution differences: 1target speech extraction: 1signal reconstruction: 1minimisation: 1neural beamformer: 1optimisation: 1siamese neural networks: 1end to end speaker verification: 1seq2seq attention: 1text dependent: 1
Most Publications2021: 232020: 192019: 192022: 122018: 4


ICASSP2022 Anton Ratnarajah, Shi-Xiong Zhang, Meng Yu 0003, Zhenyu Tang 0001, Dinesh Manocha, Dong Yu 0001, 
Fast-Rir: Fast Neural Diffuse Room Impulse Response Generator.

ICASSP2022 Brian Yan, Chunlei Zhang, Meng Yu 0003, Shi-Xiong Zhang, Siddharth Dalmia, Dan Berrebbi, Chao Weng, Shinji Watanabe 0001, Dong Yu 0001, 
Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization.

ICASSP2022 Chunlei Zhang, Jiatong Shi, Chao Weng, Meng Yu 0003, Dong Yu 0001, 
Towards end-to-end Speaker Diarization with Generalized Neural Speaker Clustering.

Interspeech2022 Vinay Kothapally, Yong Xu 0004, Meng Yu 0003, Shi-Xiong Zhang, Dong Yu 0001, 
Joint Neural AEC and Beamforming with Double-Talk Detection.

TASLP2021 Daniel Michelsanti, Zheng-Hua Tan, Shi-Xiong Zhang, Yong Xu 0004, Meng Yu 0003, Dong Yu 0001, Jesper Jensen 0001, 
An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation.

TASLP2021 Zhuohuang Zhang, Yong Xu 0004, Meng Yu 0003, Shi-Xiong Zhang, Lianwu Chen, Donald S. Williamson, Dong Yu 0001, 
Multi-Channel Multi-Frame ADL-MVDR for Target Speech Separation.

ICASSP2021 Jiatong Shi, Chunlei Zhang, Chao Weng, Shinji Watanabe 0001, Meng Yu 0003, Dong Yu 0001, 
Improving RNN Transducer with Target Speaker Extraction and Neural Uncertainty Estimation.

ICASSP2021 Aswin Shanmugam Subramanian, Chao Weng, Shinji Watanabe 0001, Meng Yu 0003, Yong Xu 0004, Shi-Xiong Zhang, Dong Yu 0001, 
Directional ASR: A New Paradigm for E2E Multi-Speaker Speech Recognition with Source Localization.

ICASSP2021 Wei Xia, Chunlei Zhang, Chao Weng, Meng Yu 0003, Dong Yu 0001, 
Self-Supervised Text-Independent Speaker Verification Using Prototypical Momentum Contrastive Learning.

ICASSP2021 Chunlei Zhang, Meng Yu 0003, Chao Weng, Dong Yu 0001, 
Towards Robust Speaker Verification with Target Speaker Enhancement.

ICASSP2021 Naijun Zheng, Na Li 0012, Bo Wu, Meng Yu 0003, Jianwei Yu, Chao Weng, Dan Su 0002, Xunying Liu, Helen Meng, 
A Joint Training Framework of Multi-Look Separator and Speaker Embedding Extractor for Overlapped Speech.

Interspeech2021 Xiyun Li, Yong Xu 0004, Meng Yu 0003, Shi-Xiong Zhang, Jiaming Xu 0001, Bo Xu 0002, Dong Yu 0001, 
MIMO Self-Attentive RNN Beamformer for Multi-Speaker Speech Separation.

Interspeech2021 Helin Wang, Bo Wu, Lianwu Chen, Meng Yu 0003, Jianwei Yu, Yong Xu 0004, Shi-Xiong Zhang, Chao Weng, Dan Su 0002, Dong Yu 0001, 
TeCANet: Temporal-Contextual Attention Network for Environment-Aware Speech Dereverberation.

Interspeech2021 Yong Xu 0004, Zhuohuang Zhang, Meng Yu 0003, Shi-Xiong Zhang, Dong Yu 0001, 
Generalized Spatio-Temporal RNN Beamformer for Target Speech Separation.

Interspeech2021 Meng Yu 0003, Chunlei Zhang, Yong Xu 0004, Shi-Xiong Zhang, Dong Yu 0001, 
MetricNet: Towards Improved Modeling For Non-Intrusive Speech Quality Assessment.

ICASSP2020 Rongzhi Gu, Shi-Xiong Zhang, Lianwu Chen, Yong Xu 0004, Meng Yu 0003, Dan Su 0002, Yuexian Zou, Dong Yu 0001, 
Enhancing End-to-End Multi-Channel Speech Separation Via Spatial Feature Learning.

ICASSP2020 Aswin Shanmugam Subramanian, Chao Weng, Meng Yu 0003, Shi-Xiong Zhang, Yong Xu 0004, Shinji Watanabe 0001, Dong Yu 0001, 
Far-Field Location Guided Target Speech Extraction Using End-to-End Speech Recognition Objectives.

Interspeech2020 Meng Yu 0003, Xuan Ji, Bo Wu, Dan Su 0002, Dong Yu 0001, 
End-to-End Multi-Look Keyword Spotting.

Interspeech2020 Yong Xu 0004, Meng Yu 0003, Shi-Xiong Zhang, Lianwu Chen, Chao Weng, Jianming Liu, Dong Yu 0001, 
Neural Spatio-Temporal Beamformer for Target Speech Separation.

Interspeech2020 Chengzhu Yu, Heng Lu 0004, Na Hu, Meng Yu 0003, Chao Weng, Kun Xu 0005, Peng Liu, Deyi Tuo, Shiyin Kang, Guangzhi Lei, Dan Su 0002, Dong Yu 0001, 
DurIAN: Duration Informed Attention Network for Speech Synthesis.

#124  | Jasha Droppo | Google Scholar   DBLP
VenuesInterspeech: 14ICASSP: 13TASLP: 2KDD: 1
Years2022: 42021: 132020: 12018: 32017: 42016: 5
ISCA Sectionprivacy-preserving machine learning for audio & speech processing: 2speaker recognition and anti-spoofing: 1inclusive and fair speech technologies: 1neural network training methods for asr: 1non-native speech: 1robust speaker recognition: 1multi- and cross-lingual asr, other topics in asr: 1self-supervision and semi-supervision for neural asr training: 1resource-constrained asr: 1speech synthesis: 1training strategies for asr: 1conversational telephone speech recognition: 1neural networks in speech recognition: 1
IEEE Keywordspeech recognition: 11recurrent neural nets: 5speaker recognition: 4natural language processing: 4blstm: 3conversational speech recognition: 3lace: 3optimisation: 2entropy: 2unsupervised single channel overlapped speech recognition: 2permutation invariant training: 2feedforward neural nets: 2resnet: 2vgg: 2recurrent neural networks: 2convolutional neural networks: 2model fairness: 1score fusion: 1embedding adaptation: 1speaker verification: 1deep learning (artificial intelligence): 1end to end slu: 1top down attention: 1recurrent neural network transducer: 1multilingual: 1language identification: 1joint modeling: 1code switching: 1neural interfaces: 1reinforce: 1multitask training: 1spoken language understanding: 1natural language interfaces: 1keyword spotting: 1speech synthesis: 1data augmentation: 1sequence discriminative training: 1transfer learning: 1unsupervised learning: 1progressive joint training: 1correlation methods: 1language model: 1temporal correlation modeling: 1system combination: 1lstm lm: 1convolution: 1human parity: 1cnn: 1spatial smoothing: 1smoothing methods: 1neural net architecture: 1iterative methods: 1recurrent neural network: 1end to end training: 1ctc: 1deep neural network: 1stochastic gradient descent: 1gradient methods: 1scaling: 1self stabilizer: 1learning rate: 1acoustic modeling: 1linear augmented network: 1pre training: 1deep network: 1hidden markov models: 1lstm: 1acoustic model: 1dnn: 1decoding: 1vocabulary: 1wfst decoder: 1search problems: 1parallel algorithms: 1parallel viterbi: 1large vocabulary: 1
Most Publications2021: 232022: 132016: 102017: 92002: 8

Affiliations
Microsoft Research

ICASSP2022 Hua Shen, Yuguang Yang 0004, Guoli Sun, Ryan Langman, Eunjung Han, Jasha Droppo, Andreas Stolcke, 
Improving Fairness in Speaker Verification via Group-Adapted Fusion Network.

Interspeech2022 Minho Jin, Chelsea Ju, Zeya Chen, Yi-Chieh Liu, Jasha Droppo, Andreas Stolcke, 
Adversarial Reweighting for Speaker Verification Fairness.

Interspeech2022 Viet Anh Trinh, Pegah Ghahremani, Brian King, Jasha Droppo, Andreas Stolcke, Roland Maas, 
Reducing Geographic Disparities in Automatic Speech Recognition via Elastic Weight Consolidation.

KDD2022 Gopinath Chennupati, Milind Rao, Gurpreet Chadha, Aaron Eakin, Anirudh Raju, Gautam Tiwari, Anit Kumar Sahu, Ariya Rastrow, Jasha Droppo, Andy Oberlin, Buddha Nandanoor, Prahalad Venkataramanan, Zheng Wu, Pankaj Sitpure, 
ILASR: Privacy-Preserving Incremental Learning for Automatic Speech Recognition at Production Scale.

ICASSP2021 Yixin Chen 0003, Weiyi Lu, Alejandro Mottini, Li Erran Li, Jasha Droppo, Zheng Du, Belinda Zeng, 
Top-Down Attention in End-to-End Spoken Language Understanding.

ICASSP2021 Surabhi Punjabi, Harish Arsikere, Zeynab Raeesy, Chander Chandak, Nikhil Bhave, Ankish Bansal, Markus Müller, Sergio Murillo, Ariya Rastrow, Andreas Stolcke, Jasha Droppo, Sri Garimella, Roland Maas, Mat Hans, Athanasios Mouchtaris, Siegfried Kunzmann, 
Joint ASR and Language Identification Using RNN-T: An Efficient Approach to Dynamic Language Switching.

ICASSP2021 Milind Rao, Pranav Dheram, Gautam Tiwari, Anirudh Raju, Jasha Droppo, Ariya Rastrow, Andreas Stolcke, 
DO as I Mean, Not as I Say: Sequence Loss Training for Spoken Language Understanding.

ICASSP2021 Andrew Werchniak, Roberto Barra-Chicote, Yuriy Mishchenko, Jasha Droppo, Jeff Condal, Peng Liu, Anish Shah, 
Exploring the application of synthetic audio in training keyword spotters.

Interspeech2021 Jasha Droppo, Oguz Elibol, 
Scaling Laws for Acoustic Models.

Interspeech2021 Amin Fazel, Wei Yang, Yulan Liu, Roberto Barra-Chicote, Yixiong Meng, Roland Maas, Jasha Droppo
SynthASR: Unlocking Synthetic Data for Speech Recognition.

Interspeech2021 Daniel Korzekwa, Roberto Barra-Chicote, Szymon Zaporowski, Grzegorz Beringer, Jaime Lorenzo-Trueba, Alicja Serafinowicz, Jasha Droppo, Thomas Drugman, Bozena Kostek, 
Detection of Lexical Stress Errors in Non-Native (L2) English with Data Augmentation and Attention.

Interspeech2021 Jie Pu, Yuguang Yang 0004, Ruirui Li, Oguz Elibol, Jasha Droppo
Scaling Effect of Self-Supervised Speech Models.

Interspeech2021 Swayambhu Nath Ray, Minhua Wu, Anirudh Raju, Pegah Ghahremani, Raghavendra Bilgi, Milind Rao, Harish Arsikere, Ariya Rastrow, Andreas Stolcke, Jasha Droppo
Listen with Intent: Improving Speech Recognition with Audio-to-Intent Front-End.

Interspeech2021 Samik Sadhu, Di He, Che-Wei Huang, Sri Harish Mallidi, Minhua Wu, Ariya Rastrow, Andreas Stolcke, Jasha Droppo, Roland Maas, 
wav2vec-C: A Self-Supervised Model for Speech Representation Learning.

Interspeech2021 Muhammad A. Shah, Joseph Szurley, Markus Müller, Athanasios Mouchtaris, Jasha Droppo
Evaluating the Vulnerability of End-to-End Automatic Speech Recognition Models to Membership Inference Attacks.

Interspeech2021 Rupak Vignesh Swaminathan, Brian King, Grant P. Strimel, Jasha Droppo, Athanasios Mouchtaris, 
CoDERT: Distilling Encoder Representations with Co-Learning for Transducer-Based Speech Recognition.

Interspeech2021 Iván Vallés-Pérez, Julian Roth, Grzegorz Beringer, Roberto Barra-Chicote, Jasha Droppo
Improving Multi-Speaker TTS Prosody Variance with a Residual Encoder and Normalizing Flows.

Interspeech2020 Jinxi Guo, Gautam Tiwari, Jasha Droppo, Maarten Van Segbroeck, Che-Wei Huang, Andreas Stolcke, Roland Maas, 
Efficient Minimum Word Error Rate Training of RNN-Transducer for End-to-End Speech Recognition.

TASLP2018 Zhehuai Chen, Jasha Droppo, Jinyu Li 0001, Wayne Xiong, 
Progressive Joint Modeling in Unsupervised Single-Channel Overlapped Speech Recognition.

ICASSP2018 Zhehuai Chen, Jasha Droppo
Sequence Modeling in Unsupervised Single-Channel Overlapped Speech Recognition.

#125  | Massimiliano Todisco | Google Scholar   DBLP
VenuesInterspeech: 21ICASSP: 8TASLP: 1
Years2022: 22021: 72020: 52019: 52018: 52017: 32016: 3
ISCA Sectionvoice anti-spoofing and countermeasure: 2voice privacy challenge: 2speaker recognition evaluation: 2speaker recognition: 2special session: 2robust speaker recognition: 1privacy-preserving machine learning for audio & speech processing: 1the first dicova challenge: 1graph and end-to-end learning for speaker recognition: 1anti-spoofing and liveness detection: 1privacy in speech and audio interfaces: 1the 2019 automatic speaker verification spoofing and countermeasures challenge: 1novel approaches to enhancement: 1spoken corpora and annotation: 1speaker verification: 1robust speaker recognition and anti-spoofing: 1
IEEE Keywordspeaker recognition: 4presentation attack detection: 3artificial bandwidth extension: 3speech quality: 3automatic speaker verification: 2spoofing: 2security of data: 2variational auto encoder: 2latent variable: 2speech coding: 2medical signal processing: 1recurrent neural nets: 1diseases: 1auditory acoustic features: 1covid 19: 1audio signal processing: 1bi lstm: 1patient diagnosis: 1respiratory sounds: 1data augmentation: 1filtering theory: 1transient response: 1anti spoofing: 1signal classification: 1countermeasures: 1public domain software: 1spoofing counter measures: 1automatic speaker verification (asv): 1detect ion cost function: 1mean square error methods: 1generative adversarial network: 1statistical distributions: 1telephony: 1speech recognition: 1regression analysis: 1dimensionality reduction: 1bandwidth extension: 1speech codecs: 1voice quality: 1spectral analysis: 1super wideband: 1information theory: 1computational complexity: 1gaussian mixture model: 1replay: 1speaker verification: 1
Most Publications2021: 202022: 162020: 162019: 142018: 9

Affiliations
URLs

ICASSP2022 Madhu R. Kamble, Jose Patino 0001, Maria A. Zuluaga, Massimiliano Todisco
Exploring Auditory Acoustic Features for The Diagnosis of Covid-19.

ICASSP2022 Hemlata Tak, Madhu R. Kamble, Jose Patino 0001, Massimiliano Todisco, Nicholas W. D. Evans, 
Rawboost: A Raw Data Boosting and Augmentation Method Applied to Automatic Speaker Verification Anti-Spoofing.

ICASSP2021 Hemlata Tak, Jose Patino 0001, Massimiliano Todisco, Andreas Nautsch, Nicholas W. D. Evans, Anthony Larcher, 
End-to-End anti-spoofing with RawNet2.

Interspeech2021 Jose Patino 0001, Natalia A. Tomashenko, Massimiliano Todisco, Andreas Nautsch, Nicholas W. D. Evans, 
Speaker Anonymisation Using the McAdams Coefficient.

Interspeech2021 Oubaïda Chouchane, Baptiste Brossier, Jorge Esteban Gamboa Gamboa, Thomas Lardy, Hemlata Tak, Orhan Ermis, Madhu R. Kamble, Jose Patino 0001, Nicholas W. D. Evans, Melek Önen, Massimiliano Todisco
Privacy-Preserving Voice Anti-Spoofing Using Secure Multi-Party Computation.

Interspeech2021 Wanying Ge, Michele Panariello, Jose Patino 0001, Massimiliano Todisco, Nicholas W. D. Evans, 
Partially-Connected Differentiable Architecture Search for Deepfake and Spoofing Detection.

Interspeech2021 Madhu R. Kamble, José Andrés González López, Teresa Grau, Juan M. Espín, Lorenzo Cascioli, Yiqing Huang, Alejandro Gomez-Alanis, Jose Patino 0001, Roberto Font, Antonio M. Peinado, Angel M. Gomez, Nicholas W. D. Evans, Maria A. Zuluaga, Massimiliano Todisco
PANACEA Cough Sound-Based Diagnosis of COVID-19 for the DiCOVA 2021 Challenge.

Interspeech2021 Tomi Kinnunen, Andreas Nautsch, Md. Sahidullah, Nicholas W. D. Evans, Xin Wang 0037, Massimiliano Todisco, Héctor Delgado, Junichi Yamagishi, Kong Aik Lee, 
Visualizing Classifier Adjacency Relations: A Case Study in Speaker Verification and Voice Anti-Spoofing.

Interspeech2021 Hemlata Tak, Jee-weon Jung, Jose Patino 0001, Massimiliano Todisco, Nicholas W. D. Evans, 
Graph Attention Networks for Anti-Spoofing.

TASLP2020 Tomi Kinnunen, Héctor Delgado, Nicholas W. D. Evans, Kong Aik Lee, Ville Vestman, Andreas Nautsch, Massimiliano Todisco, Xin Wang 0037, Md. Sahidullah, Junichi Yamagishi, Douglas A. Reynolds, 
Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification: Fundamentals.

ICASSP2020 Pramod B. Bachhav, Massimiliano Todisco, Nicholas W. D. Evans, 
Artificial Bandwidth Extension Using Conditional Variational Auto-encoders and Adversarial Learning.

Interspeech2020 Andreas Nautsch, Jose Patino 0001, Natalia A. Tomashenko, Junichi Yamagishi, Paul-Gauthier Noé, Jean-François Bonastre, Massimiliano Todisco, Nicholas W. D. Evans, 
The Privacy ZEBRA: Zero Evidence Biometric Recognition Assessment.

Interspeech2020 Hemlata Tak, Jose Patino 0001, Andreas Nautsch, Nicholas W. D. Evans, Massimiliano Todisco
Spoofing Attack Detection Using the Non-Linear Fusion of Sub-Band Classifiers.

Interspeech2020 Natalia A. Tomashenko, Brij Mohan Lal Srivastava, Xin Wang 0037, Emmanuel Vincent 0001, Andreas Nautsch, Junichi Yamagishi, Nicholas W. D. Evans, Jose Patino 0001, Jean-François Bonastre, Paul-Gauthier Noé, Massimiliano Todisco
Introducing the VoicePrivacy Initiative.

ICASSP2019 Pramod B. Bachhav, Massimiliano Todisco, Nicholas W. D. Evans, 
Latent Representation Learning for Artificial Bandwidth Extension Using a Conditional Variational Auto-encoder.

Interspeech2019 Kong Aik Lee, Ville Hautamäki, Tomi H. Kinnunen, Hitoshi Yamamoto, Koji Okabe, Ville Vestman, Jing Huang 0019, Guohong Ding, Hanwu Sun, Anthony Larcher, Rohan Kumar Das, Haizhou Li 0001, Mickael Rouvier, Pierre-Michel Bousquet, Wei Rao, Qing Wang 0039, Chunlei Zhang, Fahimeh Bahmaninezhad, Héctor Delgado, Massimiliano Todisco
I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences.

Interspeech2019 Andreas Nautsch, Jose Patino 0001, Amos Treiber, Themos Stafylakis, Petr Mizera, Massimiliano Todisco, Thomas Schneider 0003, Nicholas W. D. Evans, 
Privacy-Preserving Speaker Recognition with Cohort Score Normalisation.

Interspeech2019 Andreas Nautsch, Catherine Jasserand, Els Kindt, Massimiliano Todisco, Isabel Trancoso, Nicholas W. D. Evans, 
The GDPR & Speech Data: Reflections of Legal and Technology Communities, First Steps Towards a Common Understanding.

Interspeech2019 Massimiliano Todisco, Xin Wang 0037, Ville Vestman, Md. Sahidullah, Héctor Delgado, Andreas Nautsch, Junichi Yamagishi, Nicholas W. D. Evans, Tomi H. Kinnunen, Kong Aik Lee, 
ASVspoof 2019: Future Horizons in Spoofed and Fake Audio Detection.

ICASSP2018 Pramod B. Bachhav, Massimiliano Todisco, Nicholas W. D. Evans, 
Efficient Super-Wide Bandwidth Extension Using Linear Prediction Based Analysis-Synthesis.

#126  | Sheng Zhao | Google Scholar   DBLP
VenuesInterspeech: 12ICASSP: 8NeurIPS: 2ICLR: 2AAAI: 2TASLP: 1KDD: 1ICML: 1IJCAI: 1
Years2022: 92021: 82020: 72019: 6
ISCA Sectionspeech synthesis: 7voice conversion and adaptation: 1language and lexical modeling for asr: 1asr model training and strategies: 1asr neural network architectures and training: 1speech synthesis paradigms and methods: 1
IEEE Keywordtext to speech: 5speech synthesis: 5speech recognition: 3medical image processing: 2speaker recognition: 2speech intelligibility: 2contextual spelling correction: 1contextual biasing: 1non autoregressive: 1iterative methods: 1optimisation: 1probability: 1fast sampling: 1image denoising: 1vocoder: 1denoising diffusion probabilistic models: 1vocoders: 1transformer: 1phonetic posteriorgrams: 1speech to animation: 1mixture of experts: 1computer animation: 1pre training: 1text analysis: 1data reduction: 1mos prediction: 1mean bias network: 1sensitivity analysis: 1video signal processing: 1correlation methods: 1speech quality assessment: 1lightweight: 1fast: 1search problems: 1autoregressive processes: 1neural architecture search: 1untranscribed data: 1adaptation: 1signal reconstruction: 1noisy speech: 1denoise: 1speech enhancement: 1frame level condition: 1signal denoising: 1emotion recognition: 1dilated residual network: 1speech emotion recognition: 1multi head self attention: 1
Most Publications2022: 242021: 182020: 172019: 152023: 10

Affiliations
URLs

TASLP2022 Xiaoqiang Wang, Yanqing Liu, Jinyu Li 0001, Veljko Miljanic, Sheng Zhao, Hosam Khalil, 
Towards Contextual Spelling Correction for Customization of End-to-End Speech Recognition Systems.

ICASSP2022 Zehua Chen, Xu Tan 0003, Ke Wang, Shifeng Pan, Danilo P. Mandic, Lei He 0005, Sheng Zhao
Infergrad: Improving Diffusion Models for Vocoder by Considering Inference in Training.

ICASSP2022 Liyang Chen, Zhiyong Wu 0001, Jun Ling, Runnan Li, Xu Tan 0003, Sheng Zhao
Transformer-S2A: Robust and Efficient Speech-to-Animation.

ICASSP2022 Guangyan Zhang, Yichong Leng, Daxin Tan, Ying Qin, Kaitao Song, Xu Tan 0003, Sheng Zhao, Tan Lee, 
A Study on the Efficacy of Model Pre-Training In Developing Neural Text-to-Speech System.

Interspeech2022 Yanqing Liu, Ruiqing Xue, Lei He 0005, Xu Tan 0003, Sheng Zhao
DelightfulTTS 2: End-to-End Speech Synthesis with Adversarial Vector-Quantized Auto-Encoders.

Interspeech2022 Yihan Wu, Xu Tan 0003, Bohan Li 0003, Lei He 0005, Sheng Zhao, Ruihua Song, Tao Qin, Tie-Yan Liu, 
AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios.

Interspeech2022 Dacheng Yin, Chuanxin Tang, Yanqing Liu, Xiaoqiang Wang, Zhiyuan Zhao, Yucheng Zhao, Zhiwei Xiong, Sheng Zhao, Chong Luo, 
RetrieverTTS: Modeling Decomposed Factors for Text-Based Speech Insertion.

Interspeech2022 Guangyan Zhang, Kaitao Song, Xu Tan 0003, Daxin Tan, Yuzi Yan, Yanqing Liu, Gang Wang, Wei Zhou, Tao Qin, Tan Lee, Sheng Zhao
Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech.

NeurIPS2022 Yichong Leng, Zehua Chen, Junliang Guo, Haohe Liu, Jiawei Chen, Xu Tan, Danilo P. Mandic, Lei He, Xiangyang Li, Tao Qin, Sheng Zhao, Tie-Yan Liu, 
BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis.

ICASSP2021 Yichong Leng, Xu Tan 0003, Sheng Zhao, Frank K. Soong, Xiang-Yang Li 0001, Tao Qin, 
MBNET: MOS Prediction for Synthesized Speech with Mean-Bias Network.

ICASSP2021 Renqian Luo, Xu Tan 0003, Rui Wang, Tao Qin, Jinzhu Li, Sheng Zhao, Enhong Chen, Tie-Yan Liu, 
Lightspeech: Lightweight and Fast Text to Speech with Neural Architecture Search.

ICASSP2021 Yuzi Yan, Xu Tan 0003, Bohan Li 0003, Tao Qin, Sheng Zhao, Yuan Shen 0001, Tie-Yan Liu, 
Adaspeech 2: Adaptive Text to Speech with Untranscribed Data.

ICASSP2021 Chen Zhang 0020, Yi Ren 0006, Xu Tan 0003, Jinglin Liu, Kejun Zhang, Tao Qin, Sheng Zhao, Tie-Yan Liu, 
Denoispeech: Denoising Text to Speech with Frame-Level Noise Modeling.

Interspeech2021 Xiaoqiang Wang, Yanqing Liu, Sheng Zhao, Jinyu Li 0001, 
A Light-Weight Contextual Spelling Correction Model for Customizing Transducer-Based Speech Recognition Systems.

Interspeech2021 Yuzi Yan, Xu Tan 0003, Bohan Li 0003, Guangyan Zhang, Tao Qin, Sheng Zhao, Yuan Shen 0001, Wei-Qiang Zhang, Tie-Yan Liu, 
Adaptive Text to Speech for Spontaneous Style.

ICLR2021 Yi Ren 0006, Chenxu Hu, Xu Tan 0003, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu, 
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.

ICLR2021 Mingjian Chen, Xu Tan 0003, Bohan Li 0003, Yanqing Liu, Tao Qin, Sheng Zhao, Tie-Yan Liu, 
AdaSpeech: Adaptive Text to Speech for Custom Voice.

Interspeech2020 Chengyi Wang 0002, Yu Wu 0012, Yujiao Du, Jinyu Li 0001, Shujie Liu 0001, Liang Lu 0001, Shuo Ren, Guoli Ye, Sheng Zhao, Ming Zhou 0001, 
Semantic Mask for Transformer Based End-to-End Speech Recognition.

Interspeech2020 Mingjian Chen, Xu Tan 0003, Yi Ren 0006, Jin Xu 0010, Hao Sun, Sheng Zhao, Tao Qin, 
MultiSpeech: Multi-Speaker Text to Speech with Transformer.

Interspeech2020 Naihan Li, Shujie Liu 0001, Yanqing Liu, Sheng Zhao, Ming Liu, Ming Zhou 0001, 
MoBoAligner: A Neural Alignment Model for Non-Autoregressive TTS with Monotonic Boundary Search.

#127  | Vidhyasaharan Sethu | Google Scholar   DBLP
VenuesInterspeech: 19ICASSP: 7SpeechComm: 3
Years2021: 32020: 22019: 32018: 72017: 62016: 8
ISCA Sectionlanguage recognition: 3spoofing detection: 2tools, corpora and resources: 1emotion and sentiment analysis: 1training strategy for speech emotion recognition: 1emotion recognition and analysis: 1language identification: 1emotion modeling: 1speaker and language recognition applications: 1speaker recognition evaluation: 1short utterances speaker recognition: 1speaker database and anti-spoofing: 1behavioral signal processing and speaker state and traits analytics: 1special session: 1speaker recognition: 1robust speaker recognition and anti-spoofing: 1
IEEE Keywordspeaker recognition: 4automatic speaker verification: 2language identification: 2i vector: 2educational courses: 1computer aided instruction: 1electrical engineering education: 1electrical engineering computing: 1dsp education: 1cochlear models: 1filter bank: 1project based learning: 1replay detection: 1voice biometrics: 1biometrics (access control): 1adversarial networks: 1multi task deep learning: 1speaker normalization: 1speech recognition: 1phoneme posterior weighted score: 1replay attack: 1phoneme detection: 1spoofing detection: 1speaker verification: 1band pass filters: 1anti spoofing: 1asvspoof 2017: 1spatial differentiation: 1iir filters: 1channel bank filters: 1security of data: 1signal detection: 1bidirectional lstm: 1dnn adaptation: 1recurrent neural nets: 1audio recording: 1factorized hidden variability learning: 1probability: 1normal distribution: 1short duration speaker verification: 1phonetic variability: 1speaker phonetic vector: 1parameter estimation: 1pattern classification: 1pllr: 1natural language processing: 1hierarchical framework: 1
Most Publications2018: 132017: 132015: 102016: 82021: 7

Affiliations
URLs

SpeechComm2021 Tharshini Gunendradasan, Eliathamby Ambikairajah, Julien Epps, Vidhyasaharan Sethu, Haizhou Li 0001, 
An adaptive transmission line cochlear model based front-end for replay attack detection.

Interspeech2021 Beena Ahmed, Kirrie J. Ballard, Denis Burnham, Tharmakulasingam Sirojan, Hadi Mehmood, Dominique Estival, Elise Baker, Felicity Cox, Joanne Arciuli, Titia Benders, Katherine Demuth, Barbara Kelly, Chloé Diskin-Holdaway, Mostafa Ali Shahin, Vidhyasaharan Sethu, Julien Epps, Chwee Beng Lee, Eliathamby Ambikairajah, 
AusKidTalk: An Auditory-Visual Corpus of 3- to 12-Year-Old Australian Children's Speech.

Interspeech2021 Deboshree Bose, Vidhyasaharan Sethu, Eliathamby Ambikairajah, 
Parametric Distributions to Model Numerical Emotion Labels.

ICASSP2020 Eliathamby Ambikairajah, Vidhyasaharan Sethu
Cochlear Signal Processing: A Platform for Learning the Fundamentals of Digital Signal Processing.

ICASSP2020 Gajan Suthokumar, Vidhyasaharan Sethu, Kaavya Sriskandaraja, Eliathamby Ambikairajah, 
Adversarial Multi-Task Learning for Speaker Normalization in Replay Detection.

ICASSP2019 Gajan Suthokumar, Kaavya Sriskandaraja, Vidhyasaharan Sethu, Chamith Wijenayake, Eliathamby Ambikairajah, 
Phoneme Specific Modelling and Scoring Techniques for Anti Spoofing System.

ICASSP2019 Buddhi Wickramasinghe, Eliathamby Ambikairajah, Julien Epps, Vidhyasaharan Sethu, Haizhou Li 0001, 
Auditory Inspired Spatial Differentiation for Replay Spoofing Attack Detection.

Interspeech2019 Anda Ouyang, Ting Dang, Vidhyasaharan Sethu, Eliathamby Ambikairajah, 
Speech Based Emotion Prediction: Can a Linear Model Work?

SpeechComm2018 Saad Irtza, Vidhyasaharan Sethu, Eliathamby Ambikairajah, Haizhou Li 0001, 
Using language cluster models in hierarchical language identification.

ICASSP2018 Sarith Fernando, Vidhyasaharan Sethu, Eliathamby Ambikairajah, 
Factorized Hidden Variability Learning for Adaptation of Short Duration Language Identification Models.

ICASSP2018 Jianbo Ma, Vidhyasaharan Sethu, Eliathamby Ambikairajah, Kong-Aik Lee, 
Speaker-Phonetic Vector Estimation for Short Duration Speaker Verification.

Interspeech2018 Mia Atcheson, Vidhyasaharan Sethu, Julien Epps, 
Demonstrating and Modelling Systematic Time-varying Annotator Disagreement in Continuous Emotion Annotation.

Interspeech2018 Sarith Fernando, Vidhyasaharan Sethu, Eliathamby Ambikairajah, 
Sub-band Envelope Features Using Frequency Domain Linear Prediction for Short Duration Language Identification.

Interspeech2018 Kaavya Sriskandaraja, Vidhyasaharan Sethu, Eliathamby Ambikairajah, 
Deep Siamese Architecture Based Replay Detection for Secure Voice Biometric.

Interspeech2018 Gajan Suthokumar, Vidhyasaharan Sethu, Chamith Wijenayake, Eliathamby Ambikairajah, 
Modulation Dynamic Features for the Detection of Replay Attacks.

Interspeech2017 Ting Dang, Vidhyasaharan Sethu, Julien Epps, Eliathamby Ambikairajah, 
An Investigation of Emotion Prediction Uncertainty Using Gaussian Mixture Regression.

Interspeech2017 Sarith Fernando, Vidhyasaharan Sethu, Eliathamby Ambikairajah, Julien Epps, 
Bidirectional Modelling for Short Duration Language Identification.

Interspeech2017 Saad Irtza, Vidhyasaharan Sethu, Eliathamby Ambikairajah, Haizhou Li 0001, 
Investigating Scalability in Hierarchical Language Identification System.

Interspeech2017 Kong-Aik Lee, Ville Hautamäki, Tomi Kinnunen, Anthony Larcher, Chunlei Zhang, Andreas Nautsch, Themos Stafylakis, Gang Liu 0001, Mickaël Rouvier, Wei Rao, Federico Alegre, J. Ma, Man-Wai Mak, Achintya Kumar Sarkar, Héctor Delgado, Rahim Saeidi, Hagai Aronowitz, Aleksandr Sizov, Hanwu Sun, Trung Hieu Nguyen 0001, G. Wang, Bin Ma 0001, Ville Vestman, Md. Sahidullah, M. Halonen, Anssi Kanervisto, Gaël Le Lan, Fahimeh Bahmaninezhad, Sergey Isadskiy, Christian Rathgeb, Christoph Busch 0001, Georgios Tzimiropoulos, Q. Qian, Z. Wang, Q. Zhao, T. Wang, H. Li, J. Xue, S. Zhu, R. Jin, T. Zhao, Pierre-Michel Bousquet, Moez Ajili, Waad Ben Kheder, Driss Matrouf, Zhi Hao Lim, Chenglin Xu, Haihua Xu, Xiong Xiao, Eng Siong Chng, Benoit G. B. Fauve, Kaavya Sriskandaraja, Vidhyasaharan Sethu, W. W. Lin, Dennis Alexander Lehmann Thomsen, Zheng-Hua Tan, Massimiliano Todisco, Nicholas W. D. Evans, Haizhou Li 0001, John H. L. Hansen, Jean-François Bonastre, Eliathamby Ambikairajah, 
The I4U Mega Fusion and Collaboration for NIST Speaker Recognition Evaluation 2016.

Interspeech2017 Jianbo Ma, Vidhyasaharan Sethu, Eliathamby Ambikairajah, Kong-Aik Lee, 
Incorporating Local Acoustic Variability Information into Short Duration Speaker Verification.

#128  | Mathew Magimai-Doss | Google Scholar   DBLP
VenuesInterspeech: 15ICASSP: 10SpeechComm: 3TASLP: 1
Years2022: 32021: 52020: 42019: 72018: 62017: 12016: 3
ISCA Sectionspeaker state and trait: 2acoustic signal representation and analysis: 1speech segmentation: 1speech recognition of atypical speech: 1speech signal analysis and representation: 1disordered speech: 1assessment of pathological speech and language: 1alzheimer’s dementia recognition through spontaneous speech: 1the interspeech 2019 computational paralinguistics challenge (compare): 1topics in speech and audio signal processing: 1speaker verification: 1the interspeech 2018 computational paralinguistics challenge (compare): 1special session: 1low resource speech recognition: 1
IEEE Keywordhidden markov models: 4medical signal processing: 3speech recognition: 3diseases: 2low pass filters: 2sign language recognition: 2convolutional neural networks: 2speaker recognition: 2boaw: 1lung: 1breathing pattern estimation: 1covid 19 identification: 1compare features: 1phoneme recognition: 1epidemics: 1audio signal processing: 1respiratory parameters: 1mean square error methods: 1speech breathing: 1pneumodynamics: 1parameter estimation: 1gaussian processes: 1dysarthria: 1pathological speech processing: 1entropy: 1lf mmi: 1phonocardiography: 1zero frequency filter: 1time frequency analysis: 1phonocardiogram: 1s1–s2 detection: 1modified zff: 1gesture recognition: 1hand shape modeling: 1natural language processing: 1handicapped aids: 1sign language processing: 1multilingual sign language recognition: 1hand movement modeling: 1end to end training.: 1acoustic modeling: 1children speech recognition: 1probability: 1confidence measures: 1segment level training: 1local posterior probability: 1zero frequency filtering: 1glottal source signals: 1depression detection: 1sign language: 1articulatory features: 1hidden markov model: 1subunits: 1emotion recognition: 1fundamental frequency: 1convolutional neural network: 1end to end learning: 1speaker verification: 1convolution: 1channel bank filters: 1feedforward neural nets: 1anti spoofing: 1cross database: 1presentation attack detection: 1spectral statistics: 1
Most Publications2021: 142011: 142012: 112007: 112015: 9

Affiliations
URLs

ICASSP2022 Zohreh Mostaani, RaviShankar Prasad, Bogdan Vlasenko, Mathew Magimai-Doss
Modeling of Pre-Trained Neural Network Embeddings Learned From Raw Waveform for COVID-19 Infection Detection.

Interspeech2022 Zohreh Mostaani, Mathew Magimai-Doss
On Breathing Pattern Information in Synthetic Speech.

Interspeech2022 Eklavya Sarkar, RaviShankar Prasad, Mathew Magimai-Doss
Unsupervised Voice Activity Detection by Modeling Source and System Information using Zero Frequency Filtering.

ICASSP2021 Zohreh Mostaani, Venkata Srikanth Nallanthighal, Aki Härmä, Helmer Strik, Mathew Magimai-Doss
On The Relationship Between Speech-Based Breathing Signal Prediction Evaluation Measures and Breathing Parameters Estimation.

Interspeech2021 Enno Hermann, Mathew Magimai-Doss
Handling Acoustic Variation in Dysarthric Speech Recognition Systems Through Model Combination.

Interspeech2021 RaviShankar Prasad, Mathew Magimai-Doss
Identification of F1 and F2 in Speech Using Modified Zero Frequency Filtering.

Interspeech2021 Juan Camilo Vásquez-Correa, Julian Fritsch, Juan Rafael Orozco-Arroyave, Elmar Nöth, Mathew Magimai-Doss
On Modeling Glottal Source Information for Phonation Assessment in Parkinson's Disease.

Interspeech2021 Esaú Villatoro-Tello, S. Pavankumar Dubagunta, Julian Fritsch, Gabriela Ramírez-de-la-Rosa, Petr Motlícek, Mathew Magimai-Doss
Late Fusion of the Available Lexicon and Raw Waveform-Based Acoustic Modeling for Depression and Dementia Recognition.

ICASSP2020 Enno Hermann, Mathew Magimai-Doss
Dysarthric Speech Recognition with Lattice-Free MMI.

ICASSP2020 RaviShankar Prasad, Gürkan Yilmaz, Olivier Chételat, Mathew Magimai-Doss
Detection Of S1 And S2 Locations In Phonocardiogram Signals Using Zero Frequency Filter.

ICASSP2020 Sandrine Tornay, Marzieh Razavi, Mathew Magimai-Doss
Towards Multilingual Sign Language Recognition.

Interspeech2020 Nicholas Cummins, Yilin Pan, Zhao Ren, Julian Fritsch, Venkata Srikanth Nallanthighal, Heidi Christensen, Daniel Blackburn, Björn W. Schuller, Mathew Magimai-Doss, Helmer Strik, Aki Härmä, 
A Comparison of Acoustic and Linguistics Methodologies for Alzheimer's Dementia Recognition.

SpeechComm2019 Dimitri Palaz, Mathew Magimai-Doss, Ronan Collobert, 
End-to-end acoustic modeling using convolutional neural networks for HMM-based automatic speech recognition.

ICASSP2019 S. Pavankumar Dubagunta, Selen Hande Kabil, Mathew Magimai-Doss
Improving Children Speech Recognition through Feature Learning from Raw Speech Signal.

ICASSP2019 S. Pavankumar Dubagunta, Mathew Magimai-Doss
Segment-level Training of ANNs Based on Acoustic Confidence Measures for Hybrid HMM/ANN Speech Recognition.

ICASSP2019 S. Pavankumar Dubagunta, Bogdan Vlasenko, Mathew Magimai-Doss
Learning Voice Source Related Information for Depression Detection.

ICASSP2019 Sandrine Tornay, Marzieh Razavi, Necati Cihan Camgöz, Richard Bowden, Mathew Magimai-Doss
HMM-based Approaches to Model Multichannel Information in Sign Language Inspired from Articulatory Features-based Speech Processing.

Interspeech2019 S. Pavankumar Dubagunta, Mathew Magimai-Doss
Using Speech Production Knowledge for Raw Waveform Modelling Based Styrian Dialect Identification.

Interspeech2019 Hannah Muckenhirn, Vinayak Abrol, Mathew Magimai-Doss, Sébastien Marcel, 
Understanding and Visualizing Raw Waveform-Based CNNs.

SpeechComm2018 Marzieh Razavi, Ramya Rasipuram, Mathew Magimai-Doss
Towards weakly supervised acoustic subword unit discovery and lexicon development using hidden Markov models.

#129  | Herman Kamper | Google Scholar   DBLP
VenuesInterspeech: 16ICASSP: 9TASLP: 3NAACL: 1
Years2022: 32021: 62020: 42019: 82018: 32017: 32016: 2
ISCA Sectionlow-resource speech recognition: 3multimodal systems: 2low-resource asr development: 1zero, low-resource and multi-modal speech recognition: 1the zero resource speech challenge 2020: 1topics in asr: 1the zero resource speech challenge 2019: 1feature extraction for asr: 1corpus annotation and evaluation: 1selected topics in neural speech processing: 1topics in speech recognition: 1speech recognition: 1spoken document processing: 1
IEEE Keywordspeech recognition: 9natural language processing: 6zero resource speech processing: 4acoustic word embeddings: 4unsupervised learning: 4low resource speech processing: 3query by example: 3multimodal modelling: 3multilingual models: 2recurrent neural nets: 2vocabulary: 2signal classification: 2speech translation: 2visual grounding: 2semantic retrieval: 2word acquisition: 2query processing: 2indexterms: 1gaussian processes: 1self supervised learning: 1image representation: 1voice conversion: 1speech synthesis: 1acoustic unit discovery: 1speaker recognition: 1linguistics: 1computational linguistics: 1supervised learning: 1transfer learning: 1unwritten languages: 1speech classification: 1text analysis: 1signal reconstruction: 1signal representation: 1indexing: 1keyword spotting: 1information retrieval: 1speech retrieval: 1cross modal matching: 1image resolution: 1convolutional neural nets: 1nearest neighbour methods: 1one shot learning: 1decoding: 1speech search: 1spoken term discovery: 1pattern clustering: 1word segmentation: 1computational complexity: 1language translation: 1weakly supervised learning: 1unsupervised term discovery: 1speech segmentation: 1word discovery: 1monte carlo methods: 1unsupervised speech processing: 1belief networks: 1markov processes: 1query by example search: 1fixed dimensional representations: 1word processing: 1segmental acoustic models: 1
Most Publications2020: 242021: 212018: 192022: 162019: 16

Affiliations
URLs

ICASSP2022 Benjamin van Niekerk, Marc-André Carbonneau, Julian Zaïdi, Matthew Baas, Hugo Seuté, Herman Kamper
A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion.

Interspeech2022 Matthew Baas, Herman Kamper
Voice Conversion Can Improve ASR in Very Low-Resource Settings.

Interspeech2022 Werner van der Merwe, Herman Kamper, Johan Adam du Preez, 
A Temporal Extension of Latent Dirichlet Allocation for Unsupervised Acoustic Unit Discovery.

TASLP2021 Herman Kamper, Yevgen Matusevych, Sharon Goldwater, 
Improved Acoustic Word Embeddings for Zero-Resource Languages Using Multilingual Transfer.

Interspeech2021 Christiaan Jacobs, Herman Kamper
Multilingual Transfer of Acoustic Word Embeddings Improves When Training on Languages Related to the Target Zero-Resource Language.

Interspeech2021 Herman Kamper, Benjamin van Niekerk, 
Towards Unsupervised Phone and Word Segmentation Using Self-Supervised Vector-Quantized Neural Networks.

Interspeech2021 Benjamin van Niekerk, Leanne Nortje, Matthew Baas, Herman Kamper
Analyzing Speaker Information in Self-Supervised Models to Improve Zero-Resource Speech Processing.

Interspeech2021 Leanne Nortje, Herman Kamper
Direct Multimodal Few-Shot Learning of Speech and Images.

Interspeech2021 Kayode Olaleye, Herman Kamper
Attention-Based Keyword Localisation in Speech Using Visual Grounding.

ICASSP2020 Sameer Bansal, Herman Kamper, Adam Lopez, Sharon Goldwater, 
Cross-Lingual Topic Prediction For Speech Using Translations.

ICASSP2020 Herman Kamper, Yevgen Matusevych, Sharon Goldwater, 
Multilingual Acoustic Word Embedding Models for Processing Zero-resource Languages.

Interspeech2020 Benjamin van Niekerk, Leanne Nortje, Herman Kamper
Vector-Quantized Neural Networks for Acoustic Unit Discovery in the ZeroSpeech 2020 Challenge.

Interspeech2020 Leanne Nortje, Herman Kamper
Unsupervised vs. Transfer Learning for Multimodal One-Shot Matching of Speech and Images.

TASLP2019 Herman Kamper, Gregory Shakhnarovich, Karen Livescu, 
Semantic Speech Retrieval With a Visually Grounded Model of Untranscribed Speech.

ICASSP2019 Ryan Eloff, Herman A. Engelbrecht, Herman Kamper
Multimodal One-shot Learning of Speech and Images.

ICASSP2019 Herman Kamper
Truly Unsupervised Acoustic Word Embeddings Using Weak Top-down Constraints in Encoder-decoder Models.

ICASSP2019 Herman Kamper, Aristotelis Anastassiou, Karen Livescu, 
Semantic Query-by-example Speech Search Using Visual Grounding.

Interspeech2019 Ryan Eloff, André Nortje, Benjamin van Niekerk, Avashna Govender, Leanne Nortje, Arnu Pretorius, Elan Van Biljon, Ewald van der Westhuizen, Lisa van Staden, Herman Kamper
Unsupervised Acoustic Unit Discovery for Speech Synthesis Using Discrete Latent-Variable Neural Networks.

Interspeech2019 Raghav Menon, Herman Kamper, Ewald van der Westhuizen, John A. Quinn, Thomas Niesler, 
Feature Exploration for Almost Zero-Resource ASR-Free Keyword Spotting Using a Multilingual Bottleneck Extractor and Correspondence Autoencoders.

Interspeech2019 Ankita Pasad, Bowen Shi, Herman Kamper, Karen Livescu, 
On the Contributions of Visual and Textual Supervision in Low-Resource Semantic Speech Retrieval.

#130  | Gakuto Kurata | Google Scholar   DBLP
VenuesInterspeech: 21ICASSP: 8
Years2022: 42021: 22020: 52019: 52018: 22017: 82016: 3
ISCA Sectionmodel training for asr: 3novel models and training methods for asr: 1multi-, cross-lingual and other topics in asr: 1robust asr, and far-field/multi-talker asr: 1asr: 1language and lexical modeling for asr: 1speaker diarization: 1spoken language understanding: 1streaming asr: 1adjusting to speaker, accent, and domain: 1neural network training strategies for asr: 1neural network acoustic models for asr: 1acoustic models for asr: 1far-field speech recognition: 1neural networks for language modeling: 1conversational telephone speech recognition: 1spoken term detection: 1neural networks in speech recognition: 1spoken documents, spoken understanding and semantic analysis: 1
IEEE Keywordspeech recognition: 7recurrent neural nets: 3natural language processing: 3automatic speech recognition: 2text analysis: 2neural network: 2data analysis: 1spoken language understanding: 1language translation: 1domain adaptation: 1spontaneous speech: 1parallel corpus: 1transformer: 1representation learning: 1speaker embedding: 1speaker diarization: 1speaker recognition: 1n gram: 1rnnlm: 1vocabulary: 1tem plate: 1interpolation: 1subword: 1broadcast news: 1deep neural networks.: 1cnn: 1transforms: 1joint training: 1denoising autoencoder: 1channel bank filters: 1feedforward neural nets: 1signal denoising: 1harmonic structure: 1acoustic model: 1time frequency analysis: 1data augmentation: 1feature fusion: 1gaussian processes: 1noise robust: 1overlap: 1monaural speech: 1hidden markov models: 1garbage model: 1mixture models: 1telephone traffic recording: 1telephone conversation: 1
Most Publications2017: 112019: 82022: 62020: 62016: 6

Affiliations
URLs

Interspeech2022 Xiaodong Cui, George Saon, Tohru Nagano, Masayuki Suzuki, Takashi Fukuda, Brian Kingsbury, Gakuto Kurata
Improving Generalization of Deep Neural Network Acoustic Models with Length Perturbation and N-best Based Label Smoothing.

Interspeech2022 Takashi Fukuda, Samuel Thomas 0001, Masayuki Suzuki, Gakuto Kurata, George Saon, Brian Kingsbury, 
Global RNN Transducer Models For Multi-dialect Speech Recognition.

Interspeech2022 Sashi Novitasari, Takashi Fukuda, Gakuto Kurata
Improving ASR Robustness in Noisy Condition Through VAD Integration.

Interspeech2022 Takuma Udagawa, Masayuki Suzuki, Gakuto Kurata, Nobuyasu Itoh, George Saon, 
Effect and Analysis of Large-scale Language Model Rescoring on Competitive ASR Systems.

ICASSP2021 Samuel Thomas 0001, Hong-Kwang Jeff Kuo, George Saon, Zoltán Tüske, Brian Kingsbury, Gakuto Kurata, Zvi Kons, Ron Hoory, 
RNN Transducer Models for Spoken Language Understanding.

Interspeech2021 Gakuto Kurata, George Saon, Brian Kingsbury, David Haws, Zoltán Tüske, 
Improving Customization of Neural Transducers by Mitigating Acoustic Mismatch of Synthesized Audio.

ICASSP2020 Shintaro Ando, Masayuki Suzuki, Nobuyasu Itoh, Gakuto Kurata, Nobuaki Minematsu, 
Converting Written Language to Spoken Language with Neural Machine Translation for Language Modeling.

ICASSP2020 Yosuke Higuchi, Masayuki Suzuki, Gakuto Kurata
Speaker Embeddings Incorporating Acoustic Conditions for Diarization.

Interspeech2020 Hagai Aronowitz, Weizhong Zhu, Masayuki Suzuki, Gakuto Kurata, Ron Hoory, 
New Advances in Speaker Diarization.

Interspeech2020 Hong-Kwang Jeff Kuo, Zoltán Tüske, Samuel Thomas 0001, Yinghui Huang, Kartik Audhkhasi, Brian Kingsbury, Gakuto Kurata, Zvi Kons, Ron Hoory, Luis A. Lastras, 
End-to-End Spoken Language Understanding Without Full Transcripts.

Interspeech2020 Gakuto Kurata, George Saon, 
Knowledge Distillation from Offline to Streaming RNN Transducer for End-to-End Speech Recognition.

ICASSP2019 Masayuki Suzuki, Nobuyasu Itoh, Tohru Nagano, Gakuto Kurata, Samuel Thomas 0001, 
Improvements to N-gram Language Model Using Text Generated from Neural Language Model.

ICASSP2019 Samuel Thomas 0001, Masayuki Suzuki, Yinghui Huang, Gakuto Kurata, Zoltán Tüske, George Saon, Brian Kingsbury, Michael Picheny, Tom Dibert, Alice Kaiser-Schatzlein, Bern Samko, 
English Broadcast News Speech Recognition by Humans and Machines.

Interspeech2019 Takashi Fukuda, Masayuki Suzuki, Gakuto Kurata
Direct Neuron-Wise Fusion of Cognate Neural Networks.

Interspeech2019 Gakuto Kurata, Kartik Audhkhasi, 
Guiding CTC Posterior Spike Timings for Improved Posterior Fusion and Knowledge Distillation.

Interspeech2019 Gakuto Kurata, Kartik Audhkhasi, 
Multi-Task CTC Training with Auxiliary Feature Reconstruction for End-to-End Speech Recognition.

Interspeech2018 Takashi Fukuda, Raul Fernandez, Andrew Rosenberg, Samuel Thomas 0001, Bhuvana Ramabhadran, Alexander Sorin, Gakuto Kurata
Data Augmentation Improves Recognition of Foreign Accented Speech.

Interspeech2018 Masayuki Suzuki, Tohru Nagano, Gakuto Kurata, Samuel Thomas 0001, 
Inference-Invariant Transformation of Batch Normalization for Domain Adaptation of Acoustic Models.

ICASSP2017 Takashi Fukuda, Osamu Ichikawa, Gakuto Kurata, Ryuki Tachibana, Samuel Thomas 0001, Bhuvana Ramabhadran, 
Effective joint training of denoising feature space transforms and Neural Network based acoustic models.

ICASSP2017 Osamu Ichikawa, Takashi Fukuda, Masayuki Suzuki, Gakuto Kurata, Bhuvana Ramabhadran, 
Harmonic feature fusion for robust neural network-based acoustic modeling.

#131  | Juan Rafael Orozco-Arroyave | Google Scholar   DBLP
VenuesInterspeech: 19ICASSP: 7SpeechComm: 3
Years2023: 12022: 32021: 52020: 32019: 52018: 42017: 52016: 3
ISCA Sectionspeech and language analytics for medical applications: 2pathological speech and language: 2special session: 2technology for disordered speech: 1speech and language in health: 1show and tell: 1the interspeech 2021 computational paralinguistics challenge (compare): 1the adresso challenge: 1disordered speech: 1the interspeech 2020 computational paralinguistics challenge (compare): 1speech perception in adverse listening conditions: 1applications in language learning and healthcare: 1social signals detection and speaker traits analysis: 1speech and language analytics for mental health: 1automatic detection and recognition of voice and speech disorders: 1voice, speech and hearing disorders: 1
IEEE Keyworddiseases: 6medical signal processing: 5parkinson's disease: 3gait analysis: 2speech analysis: 2parkinson’s disease: 2patient treatment: 2neurophysiology: 2patient monitoring: 2updrs: 2natural language processing: 1acoustic analysis: 1medical disorders: 1linguistic analysis: 1psen1–e280a: 1alzheimer’s disease: 1medical diagnostic computing: 1smartphones: 1deep learning (artificial intelligence): 1smart phones: 1gaussian processes: 1handwriting analysis: 1ivectors: 1mixture models: 1gmm ubm: 1speaker recognition: 1speech enhancement: 1speech impairments: 1mobile handsets: 1mobile devices: 1classification: 1signal classification: 1phonological features: 1speech synthesis: 1non modal phonation: 1phonological vocoding: 1gcca: 1multi view learning: 1handwriting processing: 1frenchay dysarthria assessment: 1gait processing: 1articulation: 1speech: 1intelligibility: 1speech recognition: 1
Most Publications2021: 222022: 182019: 162020: 152018: 15

Affiliations
URLs

SpeechComm2023 Paula Andrea Pérez-Toro, Tomás Arias-Vergara, Philipp Klumpp, Juan Camilo Vásquez-Correa, Maria Schuster, Elmar Nöth, Juan Rafael Orozco-Arroyave
Depression assessment in people with Parkinson's disease: The combination of acoustic features and natural language processing.

Interspeech2022 Abner Hernandez, Paula Andrea Pérez-Toro, Elmar Nöth, Juan Rafael Orozco-Arroyave, Andreas K. Maier, Seung Hee Yang, 
Cross-lingual Self-Supervised Speech Representations for Improved Dysarthric Speech Recognition.

Interspeech2022 Paula Andrea Pérez-Toro, Philipp Klumpp, Abner Hernandez, Tomas Arias, Patricia Lillo, Andrea Slachevsky, Adolfo Martín García, Maria Schuster, Andreas K. Maier, Elmar Nöth, Juan Rafael Orozco-Arroyave
Alzheimer's Detection from English to Spanish Using Acoustic and Linguistic Embeddings.

Interspeech2022 P. Schäfer, Paula Andrea Pérez-Toro, Philipp Klumpp, Juan Rafael Orozco-Arroyave, Elmar Nöth, Andreas K. Maier, A. Abad, Maria Schuster, Tomás Arias-Vergara, 
CoachLea: an Android Application to Evaluate the Speech Production and Perception of Children with Hearing Loss.

ICASSP2021 Paula Andrea Pérez-Toro, Juan Camilo Vásquez-Correa, Tomas Arias-Vergara, Philipp Klumpp, M. Sierra-Castrillón, M. E. Roldán-López, D. Aguillón, L. Hincapié-Henao, Carlos Andrés Tobón-Quintero, Tobias Bocklet, Maria Schuster, Juan Rafael Orozco-Arroyave, Elmar Nöth, 
Acoustic and Linguistic Analyses to Assess Early-Onset and Genetic Alzheimer's Disease.

ICASSP2021 Juan Camilo Vásquez-Correa, Tomas Arias-Vergara, Philipp Klumpp, Paula Andrea Pérez-Toro, Juan Rafael Orozco-Arroyave, Elmar Nöth, 
End-2-End Modeling of Speech and Gait from Patients with Parkinson's Disease: Comparison Between High Quality Vs. Smartphone Data.

Interspeech2021 Philipp Klumpp, Tobias Bocklet, Tomas Arias-Vergara, Juan Camilo Vásquez-Correa, Paula Andrea Pérez-Toro, Sebastian P. Bayerl, Juan Rafael Orozco-Arroyave, Elmar Nöth, 
The Phonetic Footprint of Covid-19?

Interspeech2021 Paula Andrea Pérez-Toro, Sebastian P. Bayerl, Tomas Arias-Vergara, Juan Camilo Vásquez-Correa, Philipp Klumpp, Maria Schuster, Elmar Nöth, Juan Rafael Orozco-Arroyave, Korbinian Riedhammer, 
Influence of the Interviewer on the Automatic Assessment of Alzheimer's Disease in the Context of the ADReSSo Challenge.

Interspeech2021 Juan Camilo Vásquez-Correa, Julian Fritsch, Juan Rafael Orozco-Arroyave, Elmar Nöth, Mathew Magimai-Doss, 
On Modeling Glottal Source Information for Phonation Assessment in Parkinson's Disease.

SpeechComm2020 Juan Camilo Vásquez-Correa, Tomas Arias-Vergara, Maria Schuster, Juan Rafael Orozco-Arroyave, Elmar Nöth, 
Parallel Representation Learning for the Classification of Pathological Speech: Studies on Parkinson's Disease and Cleft Lip and Palate.

ICASSP2020 Juan Camilo Vásquez-Correa, Tobias Bocklet, Juan Rafael Orozco-Arroyave, Elmar Nöth, 
Comparison of User Models Based on GMM-UBM and I-Vectors for Speech, Handwriting, and Gait Assessment of Parkinson's Disease Patients.

Interspeech2020 Philipp Klumpp, Tomás Arias-Vergara, Juan Camilo Vásquez-Correa, Paula Andrea Pérez-Toro, Florian Hönig, Elmar Nöth, Juan Rafael Orozco-Arroyave
Surgical Mask Detection with Deep Recurrent Phonetic Models.

Interspeech2019 Tomas Arias-Vergara, Juan Rafael Orozco-Arroyave, Milos Cernak, Sandra Gollwitzer, Maria Schuster, Elmar Nöth, 
Phone-Attribute Posteriors to Evaluate the Speech of Cochlear Implant Users.

Interspeech2019 José Vicente Egas López, Juan Rafael Orozco-Arroyave, Gábor Gosztolya, 
Assessing Parkinson's Disease from Speech Using Fisher Vectors.

Interspeech2019 Alice Rueda, Juan Camilo Vásquez-Correa, Cristian David Rios-Urrego, Juan Rafael Orozco-Arroyave, Sridhar Krishnan 0001, Elmar Nöth, 
Feature Representation of Pathophysiology of Parkinsonian Dysarthria.

Interspeech2019 Juan Camilo Vásquez-Correa, Tomas Arias-Vergara, Philipp Klumpp, M. Strauss, Arne Küderle, Nils Roth, S. Bayerl, Nicanor García-Ospina, Paula Andrea Pérez-Toro, L. Felipe Parra-Gallego, Cristian David Rios-Urrego, Daniel Escobar-Grisales, Juan Rafael Orozco-Arroyave, Björn M. Eskofier, Elmar Nöth, 
Apkinson: A Mobile Solution for Multimodal Assessment of Patients with Parkinson's Disease.

Interspeech2019 Juan Camilo Vásquez-Correa, Philipp Klumpp, Juan Rafael Orozco-Arroyave, Elmar Nöth, 
Phonet: A Tool Based on Gated Recurrent Neural Networks to Extract Phonological Posteriors from Speech.

SpeechComm2018 Tomas Arias-Vergara, Juan Camilo Vásquez-Correa, Juan Rafael Orozco-Arroyave, Elmar Nöth, 
Speaker models for monitoring Parkinson's disease progression considering different communication channels and acoustic conditions.

ICASSP2018 Tomas Arias-Vergara, Juan Camilo Vásquez-Correa, Juan Rafael Orozco-Arroyave, Philipp Klumpp, Elmar Nöth, 
Unobtrusive Monitoring of Speech Impairments of Parkinson'S Disease Patients Through Mobile Devices.

Interspeech2018 Nicanor García, Juan Camilo Vásquez-Correa, Juan Rafael Orozco-Arroyave, Elmar Nöth, 
Multimodal I-vectors to Detect and Evaluate Parkinson's Disease.

#132  | Tomoki Hayashi | Google Scholar   DBLP
VenuesInterspeech: 15ICASSP: 10TASLP: 4
Years2022: 32021: 72020: 72019: 62018: 32017: 3
ISCA Sectionspeech synthesis: 3neural techniques for voice conversion and waveform generation: 2wavenet and novel paradigms: 2acoustic event detection and acoustic scene classification: 1speech enhancement, bandwidth extension and hearing aids: 1voice conversion and adaptation: 1the zero resource speech challenge 2020: 1neural waveform generation: 1sequence models for asr: 1recurrent neural models for asr: 1voice conversion and speech synthesis: 1
IEEE Keywordvoice conversion: 6speech synthesis: 5speech recognition: 5recurrent neural nets: 4autoregressive processes: 3transformer: 3convolutional neural nets: 3vocoders: 3speech coding: 3sequence to sequence: 2streaming: 2natural language processing: 2self supervised speech representation: 2pitch dependent dilated convolution: 2neural vocoder: 2audio signal processing: 2speaker recognition: 2end to end: 2non autoregressive: 1self supervised learning: 1computer based training: 1open source: 1pretraining: 1parallel wavegan: 1quasi periodic wavenet: 1pitch controllability: 1wavenet: 1vocoder: 1quasi periodic structure: 1end to end speech processing: 1conformer: 1any to one voice conversion: 1signal representation: 1sequence to sequence modeling: 1vq wav2vec: 1open source software: 1gaussian processes: 1vector quantized variational autoencoder: 1nonparallel: 1supervised learning: 1self attention: 1sound event detection: 1weakly supervised learning: 1prediction theory: 1shallow model: 1laplacian distribution: 1wavenet vocoder: 1multiple samples output: 1linear prediction: 1voice activity detection: 1ctc greedy search: 1signal detection: 1unpaired data: 1expert systems: 1cycle consistency: 1oversmoothed parameters: 1wavenet fine tuning: 1cyclic recurrent neural network: 1hidden semi markov model (hsmm): 1polyphonic sound event detection (sed): 1hidden markov models: 1hybrid model: 1recurrent neural network: 1long short term memory (lstm): 1duration control: 1
Most Publications2020: 252021: 222019: 192018: 162022: 12

Affiliations
URLs

ICASSP2022 Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda, 
An Investigation of Streaming Non-Autoregressive sequence-to-sequence Voice Conversion.

ICASSP2022 Wen-Chin Huang, Shu-Wen Yang, Tomoki Hayashi, Hung-Yi Lee, Shinji Watanabe 0001, Tomoki Toda, 
S3PRL-VC: Open-Source Voice Conversion Framework with Self-Supervised Speech Representations.

Interspeech2022 Jiatong Shi, Shuai Guo, Tao Qian, Tomoki Hayashi, Yuning Wu, Fangzheng Xu, Xuankai Chang, Huazhe Li, Peter Wu, Shinji Watanabe 0001, Qin Jin, 
Muskits: an End-to-end Music Processing Toolkit for Singing Voice Synthesis.

TASLP2021 Wen-Chin Huang, Tomoki Hayashi, Yi-Chiao Wu, Hirokazu Kameoka, Tomoki Toda, 
Pretraining Techniques for Sequence-to-Sequence Voice Conversion.

TASLP2021 Yi-Chiao Wu, Tomoki Hayashi, Takuma Okamoto, Hisashi Kawai, Tomoki Toda, 
Quasi-Periodic Parallel WaveGAN: A Non-Autoregressive Raw Waveform Generative Model With Pitch-Dependent Dilated Convolution Neural Network.

TASLP2021 Yi-Chiao Wu, Tomoki Hayashi, Patrick Lumban Tobing, Kazuhiro Kobayashi, Tomoki Toda, 
Quasi-Periodic WaveNet: An Autoregressive Raw Waveform Generative Model With Pitch-Dependent Dilated Convolution Neural Network.

ICASSP2021 Pengcheng Guo, Florian Boyer, Xuankai Chang, Tomoki Hayashi, Yosuke Higuchi, Hirofumi Inaguma, Naoyuki Kamo, Chenda Li, Daniel Garcia-Romero, Jiatong Shi, Jing Shi 0003, Shinji Watanabe 0001, Kun Wei, Wangyou Zhang, Yuekai Zhang, 
Recent Developments on Espnet Toolkit Boosted By Conformer.

ICASSP2021 Wen-Chin Huang, Yi-Chiao Wu, Tomoki Hayashi
Any-to-One Sequence-to-Sequence Voice Conversion Using Self-Supervised Discrete Speech Representations.

ICASSP2021 Kazuhiro Kobayashi, Wen-Chin Huang, Yi-Chiao Wu, Patrick Lumban Tobing, Tomoki Hayashi, Tomoki Toda, 
Crank: An Open-Source Software for Nonparallel Voice Conversion Based on Vector-Quantized Variational Autoencoder.

Interspeech2021 Tatsuya Komatsu, Shinji Watanabe 0001, Koichi Miyazaki, Tomoki Hayashi
Acoustic Event Detection with Classifier Chains.

ICASSP2020 Koichi Miyazaki, Tatsuya Komatsu, Tomoki Hayashi, Shinji Watanabe 0001, Tomoki Toda, Kazuya Takeda, 
Weakly-Supervised Sound Event Detection with Self-Attention.

ICASSP2020 Patrick Lumban Tobing, Yi-Chiao Wu, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda, 
Efficient Shallow Wavenet Vocoder Using Multiple Samples Output Based on Laplacian Distribution and Linear Prediction.

ICASSP2020 Takenori Yoshimura, Tomoki Hayashi, Kazuya Takeda, Shinji Watanabe 0001, 
End-to-End Automatic Speech Recognition Integrated with CTC-Based Voice Activity Detection.

Interspeech2020 Shu Hikosaka, Shogo Seki, Tomoki Hayashi, Kazuhiro Kobayashi, Kazuya Takeda, Hideki Banno, Tomoki Toda, 
Intelligibility Enhancement Based on Speech Waveform Modification Using Hearing Impairment.

Interspeech2020 Wen-Chin Huang, Tomoki Hayashi, Yi-Chiao Wu, Hirokazu Kameoka, Tomoki Toda, 
Voice Transformer Network: Sequence-to-Sequence Voice Conversion Using Transformer with Text-to-Speech Pretraining.

Interspeech2020 Patrick Lumban Tobing, Tomoki Hayashi, Yi-Chiao Wu, Kazuhiro Kobayashi, Tomoki Toda, 
Cyclic Spectral Modeling for Unsupervised Unit Discovery into Voice Conversion with Excitation and Waveform Modeling.

Interspeech2020 Yi-Chiao Wu, Tomoki Hayashi, Takuma Okamoto, Hisashi Kawai, Tomoki Toda, 
Quasi-Periodic Parallel WaveGAN Vocoder: A Non-Autoregressive Pitch-Dependent Dilated Convolution Model for Parametric Speech Generation.

ICASSP2019 Takaaki Hori, Ramón Fernandez Astudillo, Tomoki Hayashi, Yu Zhang 0033, Shinji Watanabe 0001, Jonathan Le Roux, 
Cycle-consistency Training for End-to-end Speech Recognition.

ICASSP2019 Patrick Lumban Tobing, Yi-Chiao Wu, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda, 
Voice Conversion with Cyclic Recurrent Neural Network and Fine-tuned Wavenet Vocoder.

Interspeech2019 Tomoki Hayashi, Shinji Watanabe 0001, Tomoki Toda, Kazuya Takeda, Shubham Toshniwal, Karen Livescu, 
Pre-Trained Text Embeddings for Enhanced Text-to-Speech Synthesis.

#133  | Tomohiro Tanaka | Google Scholar   DBLP
VenuesInterspeech: 20ICASSP: 8TASLP: 1
Years2022: 52021: 102020: 62019: 62018: 2
ISCA Sectionspeech representation: 1multi-, cross-lingual and other topics in asr: 1single-channel speech enhancement: 1novel models and training methods for asr: 1spoken language processing: 1voice activity detection and keyword spotting: 1neural network training methods for asr: 1streaming for asr/rnn transducers: 1search/decoding techniques and confidence measures for asr: 1applications in transcription, education and learning: 1training strategies for asr: 1asr neural network architectures and training: 1spoken language understanding: 1conversational systems: 1model training for asr: 1dialogue speech understanding: 1nn architectures for asr: 1spoken term detection, confidence measure, and end-to-end speech recognition: 1selected topics in neural speech processing: 1asr systems and technologies: 1
IEEE Keywordspeech recognition: 8natural language processing: 4recurrent neural nets: 3neural network: 3recurrent neural network transducer: 2end to end: 2knowledge distillation: 2attention based decoder: 1language translation: 1sequence to sequence pre training: 1spoken text normalization: 1text analysis: 1pointer generator networks: 1self supervised learning: 1blind source separation: 1audio visual: 1speech separation: 1audio signal processing: 1and cross modal: 1transformer: 1hierarchical encoder decoder: 1large context endo to end automatic speech recognition: 1synchronisation: 1whole network pre training: 1entropy: 1autoregressive processes: 1reinforcement learning: 1zero resource word segmentation: 1intelligent robots: 1word processing: 1unsupervised learning: 1spoken language acquisition: 1data acquisition: 1probability: 1speech codecs: 1connectionist temporal classification: 1attention weight: 1covariance matrix adaptation evolution strategy (cma es): 1multi objective optimization: 1deep neural network (dnn): 1evolutionary computation: 1genetic algorithm: 1pareto optimisation: 1hidden markov models: 1cloud computing: 1parallel processing: 1speech coding: 1end to end automatic speech recognition: 1attention based encoder decoder: 1hierarchical recurrent encoder decoder: 1
Most Publications2021: 282022: 152020: 112019: 112018: 8

Affiliations
URLs

ICASSP2022 Takafumi Moriya, Takanori Ashihara, Atsushi Ando, Hiroshi Sato, Tomohiro Tanaka, Kohei Matsuura, Ryo Masumura, Marc Delcroix, Takahiro Shinozaki, 
Hybrid RNN-T/Attention-Based Streaming ASR with Triggered Chunkwise Attention and Dual Internal Language Model Integration.

Interspeech2022 Takanori Ashihara, Takafumi Moriya, Kohei Matsuura, Tomohiro Tanaka
Deep versus Wide: An Analysis of Student Architectures for Task-Agnostic Knowledge Distillation of Self-Supervised Speech Models.

Interspeech2022 Ryo Masumura, Yoshihiro Yamazaki, Saki Mizuno, Naoki Makishima, Mana Ihori, Mihiro Uchida, Hiroshi Sato, Tomohiro Tanaka, Akihiko Takashima, Satoshi Suzuki, Shota Orihashi, Takafumi Moriya, Nobukatsu Hojo, Atsushi Ando, 
End-to-End Joint Modeling of Conversation History-Dependent and Independent ASR Systems with Multi-History Training.

Interspeech2022 Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Takafumi Moriya, Naoki Makishima, Mana Ihori, Tomohiro Tanaka, Ryo Masumura, 
Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations.

Interspeech2022 Tomohiro Tanaka, Ryo Masumura, Hiroshi Sato, Mana Ihori, Kohei Matsuura, Takanori Ashihara, Takafumi Moriya, 
Domain Adversarial Self-Supervised Speech Representation Learning for Improving Unknown Domain Downstream Tasks.

ICASSP2021 Mana Ihori, Naoki Makishima, Tomohiro Tanaka, Akihiko Takashima, Shota Orihashi, Ryo Masumura, 
MAPGN: Masked Pointer-Generator Network for Sequence-to-Sequence Pre-Training.

ICASSP2021 Naoki Makishima, Mana Ihori, Akihiko Takashima, Tomohiro Tanaka, Shota Orihashi, Ryo Masumura, 
Audio-Visual Speech Separation Using Cross-Modal Correspondence Loss.

ICASSP2021 Ryo Masumura, Naoki Makishima, Mana Ihori, Akihiko Takashima, Tomohiro Tanaka, Shota Orihashi, 
Hierarchical Transformer-Based Large-Context End-To-End ASR with Large-Context Knowledge Distillation.

ICASSP2021 Takafumi Moriya, Takanori Ashihara, Tomohiro Tanaka, Tsubasa Ochiai, Hiroshi Sato, Atsushi Ando, Yusuke Ijima, Ryo Masumura, Yusuke Shinohara, 
Simpleflat: A Simple Whole-Network Pre-Training Approach for RNN Transducer-Based End-to-End Speech Recognition.

Interspeech2021 Mana Ihori, Naoki Makishima, Tomohiro Tanaka, Akihiko Takashima, Shota Orihashi, Ryo Masumura, 
Zero-Shot Joint Modeling of Multiple Spoken-Text-Style Conversion Tasks Using Switching Tokens.

Interspeech2021 Naoki Makishima, Mana Ihori, Tomohiro Tanaka, Akihiko Takashima, Shota Orihashi, Ryo Masumura, 
Enrollment-Less Training for Personalized Voice Activity Detection.

Interspeech2021 Ryo Masumura, Daiki Okamura, Naoki Makishima, Mana Ihori, Akihiko Takashima, Tomohiro Tanaka, Shota Orihashi, 
Unified Autoregressive Modeling for Joint End-to-End Multi-Talker Overlapped Speech Recognition and Speaker Attribute Estimation.

Interspeech2021 Takafumi Moriya, Tomohiro Tanaka, Takanori Ashihara, Tsubasa Ochiai, Hiroshi Sato, Atsushi Ando, Ryo Masumura, Marc Delcroix, Taichi Asami, 
Streaming End-to-End Speech Recognition for Hybrid RNN-T/Attention Architecture.

Interspeech2021 Tomohiro Tanaka, Ryo Masumura, Mana Ihori, Akihiko Takashima, Takafumi Moriya, Takanori Ashihara, Shota Orihashi, Naoki Makishima, 
Cross-Modal Transformer-Based Neural Correction Models for Automatic Speech Recognition.

Interspeech2021 Tomohiro Tanaka, Ryo Masumura, Mana Ihori, Akihiko Takashima, Shota Orihashi, Naoki Makishima, 
End-to-End Rich Transcription-Style Automatic Speech Recognition with Semi-Supervised Learning.

ICASSP2020 Shengzhou Gao, Wenxin Hou, Tomohiro Tanaka, Takahiro Shinozaki, 
Spoken Language Acquisition Based on Reinforcement Learning and Word Unit Segmentation.

ICASSP2020 Takafumi Moriya, Hiroshi Sato, Tomohiro Tanaka, Takanori Ashihara, Ryo Masumura, Yusuke Shinohara, 
Distilling Attention Weights for CTC-Based ASR Systems.

Interspeech2020 Ryo Masumura, Naoki Makishima, Mana Ihori, Akihiko Takashima, Tomohiro Tanaka, Shota Orihashi, 
Phoneme-to-Grapheme Conversion Based Large-Scale Pre-Training for End-to-End Automatic Speech Recognition.

Interspeech2020 Takafumi Moriya, Tsubasa Ochiai, Shigeki Karita, Hiroshi Sato, Tomohiro Tanaka, Takanori Ashihara, Ryo Masumura, Yusuke Shinohara, Marc Delcroix, 
Self-Distillation for Improving CTC-Transformer-Based ASR Systems.

Interspeech2020 Shota Orihashi, Mana Ihori, Tomohiro Tanaka, Ryo Masumura, 
Unsupervised Domain Adaptation for Dialogue Sequence Labeling Based on Hierarchical Adversarial Training.

#134  | Gábor Gosztolya | Google Scholar   DBLP
VenuesInterspeech: 23ICASSP: 5TASLP: 1
Years2022: 32021: 42020: 32019: 52018: 52017: 52016: 4
ISCA Sectionspecial session: 3acoustic models for asr: 2the interspeech 2021 computational paralinguistics challenge (compare): 1assessment of pathological speech and language: 1topics in asr: 1computational paralinguistics: 1speech in health: 1speech production and silent interfaces: 1the interspeech 2019 computational paralinguistics challenge (compare): 1representation learning of emotion and paralinguistics: 1social signals detection and speaker traits analysis: 1speech and language analytics for medical applications: 1speech pathology, depression, and medical applications: 1the interspeech 2018 computational paralinguistics challenge (compare): 1novel paradigms for direct synthesis based on speech-related biosignals: 1topics in speech recognition: 1speech recognition: 1social signals, styles, and interaction: 1acoustic modeling with neural networks: 1speech and language processing for clinical health applications: 1
IEEE Keywordx vectors: 3cognition: 2neurophysiology: 2diseases: 2medical disorders: 2patient diagnosis: 2computational paralinguistics: 2medical speech processing: 1embeddings: 1approximation theory: 1deep neural networks: 1feedforward neural nets: 1multiple sclerosis: 1speech recognition: 1natural language processing: 1pre trained: 1i vectors: 1depression screening: 1behavioural sciences computing: 1brain: 1biomedical mri: 1mild cognitive impairment: 1dementia: 1medical image processing: 1sequence to sequence autoencoders: 1computational linguistics: 1signal classification: 1signal representation: 1ensemble learning: 1bag of audio words representation: 1classification: 1pattern clustering: 1regression analysis: 1sleepiness: 1dnn embeddings: 1support vector machines: 1speaker recognition: 1fundamental frequency: 1articulatory to acoustic mapping: 1vocoders: 1silent speech interface: 1dnn: 1
Most Publications2019: 152021: 112022: 102020: 82018: 8

Affiliations
URLs

ICASSP2022 Gábor Gosztolya, László Tóth 0001, Veronika Svindt, Judit Bóna, Ildikó Hoffmann, 
Using Acoustic Deep Neural Network Embeddings to Detect Multiple Sclerosis From Speech.

ICASSP2022 José Vicente Egas López, Gábor Kiss, Dávid Sztahó, Gábor Gosztolya
Automatic Assessment of the Degree of Clinical Depression from Speech Using X-Vectors.

ICASSP2022 Mercedes Vetráb, José Vicente Egas López, Réka Balogh, Nóra Imre, Ildikó Hoffmann, László Tóth 0001, Magdolna Pákáski, János Kálmán, Gábor Gosztolya
Using Spectral Sequence-to-Sequence Autoencoders to Assess Mild Cognitive Impairment.

TASLP2021 Gábor Gosztolya, Róbert Busa-Fekete, 
Ensemble Bag-of-Audio-Words Representation Improves Paralinguistic Classification Accuracy.

ICASSP2021 José Vicente Egas López, Gábor Gosztolya
Deep Neural Network Embeddings for the Estimation of the Degree of Sleepiness.

Interspeech2021 José Vicente Egas López, Mercedes Vetráb, László Tóth 0001, Gábor Gosztolya
Identifying Conflict Escalation and Primates by Using Ensemble X-Vectors and Fisher Vector Features.

Interspeech2021 Amin Honarmandi Shandiz, László Tóth 0001, Gábor Gosztolya, Alexandra Markó, Tamás Gábor Csapó, 
Neural Speaker Embeddings for Ultrasound-Based Silent Speech Interfaces.

Interspeech2020 Tamás Gábor Csapó, Csaba Zainkó, László Tóth 0001, Gábor Gosztolya, Alexandra Markó, 
Ultrasound-Based Articulatory-to-Acoustic Mapping with WaveGlow Speech Synthesis.

Interspeech2020 Gábor Gosztolya
Very Short-Term Conflict Intensity Estimation Using Fisher Vectors.

Interspeech2020 Gábor Gosztolya, Anita Bagi, Szilvia Szalóki, István Szendi, Ildikó Hoffmann, 
Making a Distinction Between Schizophrenia and Bipolar Disorder Based on Temporal Parameters in Spontaneous Speech.

Interspeech2019 Tamás Gábor Csapó, Mohammed Salah Al-Radhi, Géza Németh, Gábor Gosztolya, Tamás Grósz, László Tóth 0001, Alexandra Markó, 
Ultrasound-Based Silent Speech Interface Built on a Continuous Vocoder.

Interspeech2019 Gábor Gosztolya
Using Fisher Vector and Bag-of-Audio-Words Representations to Identify Styrian Dialects, Sleepiness, Baby & Orca Sounds.

Interspeech2019 Gábor Gosztolya
Using the Bag-of-Audio-Word Feature Representation of ASR DNN Posteriors for Paralinguistic Classification.

Interspeech2019 Gábor Gosztolya, László Tóth 0001, 
Calibrating DNN Posterior Probability Estimates of HMM/DNN Models to Improve Social Signal Detection from Audio Data.

Interspeech2019 José Vicente Egas López, Juan Rafael Orozco-Arroyave, Gábor Gosztolya
Assessing Parkinson's Disease from Speech Using Fisher Vectors.

ICASSP2018 Tamás Grósz, Gábor Gosztolya, László Tóth 0001, Tamás Gábor Csapó, Alexandra Markó, 
F0 Estimation for DNN-Based Ultrasound Silent Speech Interfaces.

Interspeech2018 Gábor Gosztolya, Anita Bagi, Szilvia Szalóki, István Szendi, Ildikó Hoffmann, 
Identifying Schizophrenia Based on Temporal Parameters in Spontaneous Speech.

Interspeech2018 Gábor Gosztolya, Tamás Grósz, László Tóth 0001, 
General Utterance-Level Feature Extraction for Classifying Crying Sounds, Atypical & Self-Assessed Affect and Heart Beats.

Interspeech2018 László Tóth 0001, Gábor Gosztolya, Tamás Grósz, Alexandra Markó, Tamás Gábor Csapó, 
Multi-Task Learning of Speech Recognition and Speech Synthesis Parameters for Ultrasound-based Silent Speech Interfaces.

Interspeech2018 Máté Ákos Tündik, György Szaszák, Gábor Gosztolya, András Beke, 
User-centric Evaluation of Automatic Punctuation in ASR Closed Captioning.

#135  | Ruoming Pang | Google Scholar   DBLP
VenuesICASSP: 13Interspeech: 13ICLR: 2NeurIPS: 1
Years2022: 42021: 82020: 92019: 52018: 3
ISCA Sectionasr neural network architectures: 3streaming for asr/rnn transducers: 2asr neural network architectures and training: 2language modeling and lexical modeling for asr: 1multi-, cross-lingual and other topics in asr: 1speech synthesis: 1streaming asr: 1lm adaptation, lexical units and punctuation: 1end-to-end speech recognition: 1
IEEE Keywordspeech recognition: 11recurrent neural nets: 7speech coding: 4text analysis: 2decoding: 2conformer: 2natural language processing: 2latency: 2rnn t: 2optimisation: 2transducers: 1rnnt: 1two pass asr: 1long form asr: 1speaker recognition: 1end to end asr: 1non streaming asr: 1model distillation: 1streaming asr: 1cascaded encoders: 1asr: 1model pruning: 1dynamic sparse models: 1regression analysis: 1probability: 1endpointer: 1vocabulary: 1supervised learning: 1sequence to sequence: 1filtering theory: 1unsupervised learning: 1semi supervised training: 1mobile handsets: 1neural net architecture: 1speech synthesis: 1tacotron 2: 1wavenet: 1vocoders: 1text to speech: 1waveform analysis: 1
Most Publications2020: 272021: 232019: 172022: 112018: 8

Affiliations
URLs

ICASSP2022 Ke Hu, Tara N. Sainath, Arun Narayanan, Ruoming Pang, Trevor Strohman, 
Transducer-Based Streaming Deliberation for Cascaded Encoders.

ICASSP2022 Tara N. Sainath, Yanzhang He, Arun Narayanan, Rami Botros, Weiran Wang, David Qiu, Chung-Cheng Chiu, Rohit Prabhavalkar, Alexander Gruenstein, Anmol Gulati, Bo Li 0028, David Rybach, Emmanuel Guzman, Ian McGraw, James Qin, Krzysztof Choromanski, Qiao Liang, Robert David, Ruoming Pang, Shuo-Yiin Chang, Trevor Strohman, W. Ronny Huang, Wei Han 0002, Yonghui Wu, Yu Zhang 0033, 
Improving The Latency And Quality Of Cascaded Encoders.

Interspeech2022 W. Ronny Huang, Cal Peyser, Tara N. Sainath, Ruoming Pang, Trevor D. Strohman, Shankar Kumar, 
Sentence-Select: Large-Scale Language Model Data Selection for Rare-Word Speech Recognition.

Interspeech2022 Bo Li 0028, Tara N. Sainath, Ruoming Pang, Shuo-Yiin Chang, Qiumin Xu, Trevor Strohman, Vince Chen, Qiao Liang, Heguang Liu, Yanzhang He, Parisa Haghani, Sameer Bidichandani, 
A Language Agnostic Multilingual Streaming On-Device ASR System.

ICASSP2021 Thibault Doutre, Wei Han 0002, Min Ma, Zhiyun Lu, Chung-Cheng Chiu, Ruoming Pang, Arun Narayanan, Ananya Misra, Yu Zhang 0033, Liangliang Cao, 
Improving Streaming Automatic Speech Recognition with Non-Streaming Model Distillation on Unsupervised Data.

ICASSP2021 Bo Li 0028, Anmol Gulati, Jiahui Yu, Tara N. Sainath, Chung-Cheng Chiu, Arun Narayanan, Shuo-Yiin Chang, Ruoming Pang, Yanzhang He, James Qin, Wei Han 0002, Qiao Liang, Yu Zhang 0033, Trevor Strohman, Yonghui Wu, 
A Better and Faster end-to-end Model for Streaming ASR.

ICASSP2021 Zhaofeng Wu, Ding Zhao, Qiao Liang, Jiahui Yu, Anmol Gulati, Ruoming Pang
Dynamic Sparsity Neural Networks for Automatic Speech Recognition.

ICASSP2021 Jiahui Yu, Chung-Cheng Chiu, Bo Li 0028, Shuo-Yiin Chang, Tara N. Sainath, Yanzhang He, Arun Narayanan, Wei Han 0002, Anmol Gulati, Yonghui Wu, Ruoming Pang
FastEmit: Low-Latency Streaming ASR with Sequence-Level Emission Regularization.

Interspeech2021 Thibault Doutre, Wei Han 0002, Chung-Cheng Chiu, Ruoming Pang, Olivier Siohan, Liangliang Cao, 
Bridging the Gap Between Streaming and Non-Streaming ASR Systems by Distilling Ensembles of CTC and RNN-T Models.

Interspeech2021 Tara N. Sainath, Yanzhang He, Arun Narayanan, Rami Botros, Ruoming Pang, David Rybach, Cyril Allauzen, Ehsan Variani, James Qin, Quoc-Nam Le-The, Shuo-Yiin Chang, Bo Li 0028, Anmol Gulati, Jiahui Yu, Chung-Cheng Chiu, Diamantino Caseiro, Wei Li 0133, Qiao Liang, Pat Rondon, 
An Efficient Streaming Non-Recurrent On-Device End-to-End Model with Improvements to Rare-Word Modeling.

Interspeech2021 Andros Tjandra, Ruoming Pang, Yu Zhang 0033, Shigeki Karita, 
Unsupervised Learning of Disentangled Speech Content and Style Representation.

ICLR2021 Jiahui Yu, Wei Han 0002, Anmol Gulati, Chung-Cheng Chiu, Bo Li 0028, Tara N. Sainath, Yonghui Wu, Ruoming Pang
Dual-mode ASR: Unify and Improve Streaming ASR with Full-context Modeling.

ICASSP2020 Ke Hu, Tara N. Sainath, Ruoming Pang, Rohit Prabhavalkar, 
Deliberation Model Based Two-Pass End-To-End Speech Recognition.

ICASSP2020 Bo Li 0028, Shuo-Yiin Chang, Tara N. Sainath, Ruoming Pang, Yanzhang He, Trevor Strohman, Yonghui Wu, 
Towards Fast and Accurate Streaming End-To-End ASR.

ICASSP2020 Tara N. Sainath, Yanzhang He, Bo Li 0028, Arun Narayanan, Ruoming Pang, Antoine Bruguier, Shuo-Yiin Chang, Wei Li 0133, Raziel Alvarez, Zhifeng Chen, Chung-Cheng Chiu, David Garcia, Alexander Gruenstein, Ke Hu, Anjuli Kannan, Qiao Liang, Ian McGraw, Cal Peyser, Rohit Prabhavalkar, Golan Pundak, David Rybach, Yuan Shangguan, Yash Sheth, Trevor Strohman, Mirkó Visontai, Yonghui Wu, Yu Zhang 0033, Ding Zhao, 
A Streaming On-Device End-To-End Model Surpassing Server-Side Conventional Model Quality and Latency.

ICASSP2020 Tara N. Sainath, Ruoming Pang, Ron J. Weiss, Yanzhang He, Chung-Cheng Chiu, Trevor Strohman, 
An Attention-Based Joint Acoustic and Text on-Device End-To-End Model.

Interspeech2020 Anmol Gulati, James Qin, Chung-Cheng Chiu, Niki Parmar, Yu Zhang 0033, Jiahui Yu, Wei Han 0002, Shibo Wang, Zhengdong Zhang, Yonghui Wu, Ruoming Pang
Conformer: Convolution-augmented Transformer for Speech Recognition.

Interspeech2020 Wei Han 0002, Zhengdong Zhang, Yu Zhang 0033, Jiahui Yu, Chung-Cheng Chiu, James Qin, Anmol Gulati, Ruoming Pang, Yonghui Wu, 
ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context.

Interspeech2020 Wei Li 0133, James Qin, Chung-Cheng Chiu, Ruoming Pang, Yanzhang He, 
Parallel Rescoring with Transformer for Streaming On-Device Speech Recognition.

Interspeech2020 Cal Peyser, Sepand Mavandadi, Tara N. Sainath, James Apfel, Ruoming Pang, Shankar Kumar, 
Improving Tail Performance of a Deliberation E2E ASR Model Using a Large Text Corpus.

#136  | Philip C. Woodland | Google Scholar   DBLP
VenuesICASSP: 15Interspeech: 10TASLP: 2SpeechComm: 1
Years2023: 12022: 42021: 62020: 12019: 32018: 52017: 22016: 6
ISCA Sectionneural transducers, streaming asr and novel asr models: 1robust asr, and far-field/multi-talker asr: 1neural network training methods for asr: 1search/decoding techniques and confidence measures for asr: 1speaker embedding: 1asr neural network architectures: 1neural network training strategies for asr: 1acoustic model adaptation: 1novel neural network architectures for acoustic modelling: 1new products and services: 1
IEEE Keywordspeech recognition: 15natural language processing: 5probability: 4speaker recognition: 3recurrent neural nets: 3confidence scores: 2hidden markov models: 2language models: 2d vector: 2gpu: 2recurrent neural network: 2estimation theory: 1out of domain: 1feature selection: 1end to end: 1automatic speech recognition: 1supervised learning: 1asr: 1neural transducer: 1knowledge distillation: 1end to end asr: 1transformer: 1lstm: 1cross utterance: 1content aware speaker embedding: 1distributed representation: 1diarisation: 1emotion recognition: 1self attention: 1model combination: 1speaker diarization: 1python: 1convolution: 1delays: 1time delay neural network: 1resnet: 1grid recurrent neural network: 1feedforward neural nets: 1i vectors: 1speaker adaptation: 1deep neural networks: 1multi basis adaptive neural networks: 1optimisation: 1mixture models: 1gaussian processes: 1variance regularisation: 1graphics processing units: 1pipelined training: 1noise contrastive: 1estimation: 1source code (software): 1language model: 1open source toolkit: 1audio segmentation: 1deep neural network: 1television broadcasting: 1pattern clustering: 1audio signal processing: 1multi genre broadcast data: 1error analysis: 1speech coding: 1log linear model: 1hybrid system: 1joint decoding: 1tandem system: 1structured svm: 1
Most Publications2021: 182015: 152022: 142023: 102018: 10

Affiliations
URLs

SpeechComm2023 Qiujia Li, Chao Zhang 0031, Philip C. Woodland
Combining hybrid DNN-HMM ASR systems with attention-based models using lattice rescoring.

ICASSP2022 Qiujia Li, Yu Zhang 0033, David Qiu, Yanzhang He, Liangliang Cao, Philip C. Woodland
Improving Confidence Estimation on Out-of-Domain Data for End-to-End Speech Recognition.

ICASSP2022 Xiaoyu Yang, Qiujia Li, Philip C. Woodland
Knowledge Distillation for Neural Transducers from Large Self-Supervised Pre-Trained Models.

Interspeech2022 Guangzhi Sun, Chao Zhang 0031, Philip C. Woodland
Tree-constrained Pointer Generator with Graph Neural Network Encodings for Contextual Speech Recognition.

Interspeech2022 Xianrui Zheng, Chao Zhang 0031, Philip C. Woodland
Tandem Multitask Training of Speaker Diarisation and Speech Recognition for Meeting Transcription.

ICASSP2021 Qiujia Li, David Qiu, Yu Zhang 0033, Bo Li 0028, Yanzhang He, Philip C. Woodland, Liangliang Cao, Trevor Strohman, 
Confidence Estimation for Attention-Based Sequence-to-Sequence Models for Speech Recognition.

ICASSP2021 Guangzhi Sun, Chao Zhang 0031, Philip C. Woodland
Transformer Language Models with LSTM-Based Cross-Utterance Information Representation.

ICASSP2021 Guangzhi Sun, D. Liu, Chao Zhang 0031, Philip C. Woodland
Content-Aware Speaker Embeddings for Speaker Diarisation.

ICASSP2021 Wen Wu, Chao Zhang 0031, Philip C. Woodland
Emotion Recognition by Fusing Time Synchronous and Time Asynchronous Representations.

Interspeech2021 Dongcheng Jiang, Chao Zhang 0031, Philip C. Woodland
Variable Frame Rate Acoustic Models Using Minimum Error Reinforcement Learning.

Interspeech2021 Qiujia Li, Yu Zhang 0033, Bo Li 0028, Liangliang Cao, Philip C. Woodland
Residual Energy-Based Models for End-to-End Speech Recognition.

Interspeech2020 Florian L. Kreyssig, Philip C. Woodland
Cosine-Distance Virtual Adversarial Training for Semi-Supervised Speaker-Discriminative Acoustic Embeddings.

ICASSP2019 Guangzhi Sun, Chao Zhang 0031, Philip C. Woodland
Speaker Diarisation Using 2D Self-attentive Combination of Embeddings.

ICASSP2019 Chao Zhang 0031, Florian L. Kreyssig, Qiujia Li, Philip C. Woodland
PyHTK: Python Library and ASR Pipelines for HTK.

Interspeech2019 Patrick von Platen, Chao Zhang 0031, Philip C. Woodland
Multi-Span Acoustic Modelling Using Raw Waveform Signals.

ICASSP2018 Florian L. Kreyssig, Chao Zhang 0031, Philip C. Woodland
Improved Tdnns Using Deep Kernels and Frequency Dependent Grid-RNNS.

ICASSP2018 Chao Zhang 0031, Philip C. Woodland
High Order Recurrent Neural Networks for Acoustic Modelling.

Interspeech2018 Adnan Haider, Philip C. Woodland
Combining Natural Gradient with Hessian Free Methods for Sequence Training.

Interspeech2018 Yu Wang 0027, Chao Zhang 0031, Mark J. F. Gales, Philip C. Woodland
Speaker Adaptation and Adaptive Training for Jointly Optimised Tandem Systems.

Interspeech2018 Chao Zhang 0031, Philip C. Woodland
Semi-tied Units for Efficient Gating in LSTM and Highway Networks.

#137  | Jinsong Zhang 0001 | Google Scholar   DBLP
VenuesInterspeech: 24TASLP: 2ICASSP: 2
Years2022: 42021: 42020: 72019: 22018: 52017: 32016: 3
ISCA Sectionapplications in transcription, education and learning: 2pronunciation: 2miscellaneous topics in speech, voice and hearing disorders: 1show and tell iii(vr): 1non-native speech: 1speech perception: 1speech signal representation: 1bi- and multilinguality: 1tonal aspects of acoustic phonetics and prosody: 1speech synthesis: 1speech annotation and speech assessment: 1first and second language acquisition: 1bilingualism, l2, and non-nativeness: 1speech and speaker perception: 1source and supra-segmentals: 1second language acquisition and code-switching: 1deep learning for source separation and pitch tracking: 1speech prosody: 1speech production and perception: 1prosody: 1learning, education and different speech: 1prosody, phonation and voice quality: 1
IEEE Keywordnatural language processing: 3recurrent neural nets: 2gaussian processes: 2zerospeech: 2unsupervised phoneme discovery: 2dpgmm: 2unsupervised learning: 2speech recognition: 2pronunciation error detection: 2low resource asr: 1hearing: 1infant speech perception: 1engrams: 1functional load: 1rnn: 1perception of phonemes: 1pattern classification: 1computer aided instruction: 1multi lingual learning: 1dnn: 1articulation modeling: 1computer assisted pronunciation training (capt): 1nasal coda: 1computer aided pronunciation training: 1landmark: 1error detection: 1signal classification: 1
Most Publications2016: 152022: 142021: 142018: 142020: 13

Affiliations
Beijing Language and Culture University, Beijing, China
TU Dresden, Institute of Acoustics and Speech Communication, Dresden, Germany (former)
URLs

TASLP2022 Bin Wu, Sakriani Sakti, Jinsong Zhang 0001, Satoshi Nakamura 0001, 
Modeling Unsupervised Empirical Adaptation by DPGMM and DPGMM-RNN Hybrid Model to Extract Perceptual Features for Low-Resource ASR.

Interspeech2022 Jingwen Cheng, Yuchen Yan, Yingming Gao, Xiaoli Feng, Yannan Wang, Jinsong Zhang 0001
A study of production error analysis for Mandarin-speaking Children with Hearing Impairment.

Interspeech2022 Yujia Jin, Yanlu Xie, Jinsong Zhang 0001
A VR Interactive 3D Mandarin Pronunciation Teaching Model.

Interspeech2022 Longfei Yang, Jinsong Zhang 0001, Takahiro Shinozaki, 
Self-Supervised Learning with Multi-Target Contrastive Coding for Non-Native Acoustic Modeling of Mispronunciation Verification.

TASLP2021 Bin Wu, Sakriani Sakti, Jinsong Zhang 0001, Satoshi Nakamura 0001, 
Tackling Perception Bias in Unsupervised Phoneme Discovery Using DPGMM-RNN Hybrid Model and Functional Load.

Interspeech2021 Linkai Peng, Kaiqi Fu, Binghuai Lin, Dengfeng Ke, Jinsong Zhang 0001
A Study on Fine-Tuning wav2vec2.0 Model for the Task of Mispronunciation Detection and Diagnosis.

Interspeech2021 Yuqing Zhang 0003, Zhu Li, Binghuai Lin, Jinsong Zhang 0001
A Preliminary Study on Discourse Prosody Encoding in L1 and L2 English Spontaneous Narratives.

Interspeech2021 Yuqing Zhang 0003, Zhu Li, Bin Wu, Yanlu Xie, Binghuai Lin, Jinsong Zhang 0001
Relationships Between Perceptual Distinctiveness, Articulatory Complexity and Functional Load in Speech Communication.

Interspeech2020 Wang Dai, Jinsong Zhang 0001, Yingming Gao, Wei Wei, Dengfeng Ke, Binghuai Lin, Yanlu Xie, 
Formant Tracking Using Dilated Convolutional Networks Through Dense Connection with Gating Mechanism.

Interspeech2020 Dan Du, Xianjin Zhu, Zhu Li, Jinsong Zhang 0001
Perception and Production of Mandarin Initial Stops by Native Urdu Speakers.

Interspeech2020 Yingming Gao, Xinyu Zhang, Yi Xu, Jinsong Zhang 0001, Peter Birkholz, 
An Investigation of the Target Approximation Model for Tone Modeling and Recognition in Continuous Mandarin Speech.

Interspeech2020 Binghuai Lin, Liyuan Wang, Xiaoli Feng, Jinsong Zhang 0001
Automatic Scoring at Multi-Granularity for L2 Pronunciation.

Interspeech2020 Binghuai Lin, Liyuan Wang, Xiaoli Feng, Jinsong Zhang 0001
Joint Detection of Sentence Stress and Phrase Boundary for Prosody.

Interspeech2020 Yanlu Xie, Xiaoli Feng, Boxue Li, Jinsong Zhang 0001, Yujia Jin, 
A Mandarin L2 Learning APP with Mispronunciation Detection and Feedback.

Interspeech2020 Longfei Yang, Kaiqi Fu, Jinsong Zhang 0001, Takahiro Shinozaki, 
Pronunciation Erroneous Tendency Detection with Language Adversarial Represent Learning.

Interspeech2019 Dan Du, Jinsong Zhang 0001
The Production of Chinese Affricates /ts/ and /tsh/ by Native Urdu Speakers.

Interspeech2019 Shuju Shi, Chilin Shih, Jinsong Zhang 0001
Capturing L1 Influence on L2 Pronunciation by Simulating Perceptual Space Using Acoustic Features.

Interspeech2018 Chong Cao, Wei Wei, Wei Wang, Yanlu Xie, Jinsong Zhang 0001
Interactions between Vowels and Nasal Codas in Mandarin Speakers' Perception of Nasal Finals.

Interspeech2018 Lixia Hao, Wei Zhang 0190, Yanlu Xie, Jinsong Zhang 0001
A Preliminary Study on Tonal Coarticulation in Continuous Speech.

Interspeech2018 Yue Sun, Win Thuzar Kyaw, Jinsong Zhang 0001, Yoshinori Sagisaka, 
Analysis of L2 Learners' Progress of Distinguishing Mandarin Tone 2 and Tone 3.

#138  | Emily Mower Provost | Google Scholar   DBLP
VenuesInterspeech: 18ICASSP: 5SpeechComm: 2NAACL: 1AAAI: 1TASLP: 1
Years2022: 22021: 42020: 22019: 62018: 42017: 42016: 6
ISCA Sectionspeech in health: 2integrating speech science and technology for clinical applications: 2emotion recognition: 2special session: 2(multimodal) speech emotion recognition: 1speech and language in health: 1voice quality characterization for clinical voice assessment: 1assessment of pathological speech and language: 1emotion and personality in conversation: 1emotion modeling and analysis: 1speech and language analytics for mental health: 1emotion modeling: 1pathological speech and language: 1learning, education and different speech: 1
IEEE Keywordemotion recognition: 3speech recognition: 2multi task learning: 2annotation: 1emotion: 1crowdsourcing: 1classifier performance: 1emotion perception: 1audio and phonemes: 1convolutional neural networks: 1speech emotion recognition: 1prediction theory: 1lstm: 1turn taking: 1spoken dialogues: 1human computer interaction: 1recurrent neural networks: 1speaker intentions: 1speaker recognition: 1interactive systems: 1medical signal processing: 1notebook computers: 1signal classification: 1acoustic modeling: 1machine learning: 1clinical application: 1aphasia: 1speech intelligibility assessment: 1patient treatment: 1apraxia: 1mobile health: 1medical disorders: 1bipolar disorder: 1mood modeling: 1speech analysis: 1cross corpus: 1speech emotion: 1sung emotion: 1
Most Publications2019: 192021: 122020: 102017: 102016: 9

Affiliations
University of Michigan, Ann Arbor, USA

Interspeech2022 Matthew Perez, Mimansa Jaiswal, Minxue Niu, Cristina Gorrostieta, Matthew Roddy, Kye Taylor, Reza Lotfian, John Kane, Emily Mower Provost
Mind the gap: On the value of silence representations to lexical-based speech emotion recognition.

Interspeech2022 Amrit Romana, Minxue Niu, Matthew Perez, Angela Roberts, Emily Mower Provost
Enabling Off-the-Shelf Disfluency Detection and Categorization for Pathological Speech.

SpeechComm2021 Brian Stasak, Julien Epps, Heather T. Schatten, Ivan W. Miller, Emily Mower Provost, Michael F. Armey, 
Read speech voice quality and disfluency in individuals with recent suicidal ideation or suicide attempt.

Interspeech2021 Matthew Perez, Amrit Romana, Angela Roberts, Noelle Carlozzi, Jennifer Ann Miner, Praveen Dayalu, Emily Mower Provost
Articulatory Coordination for Speech Motor Tracking in Huntington Disease.

Interspeech2021 Amrit Romana, John Bandon, Matthew Perez, Stephanie Gutierrez, Richard Richter, Angela Roberts, Emily Mower Provost
Automatically Detecting Errors and Disfluencies in Read Speech to Predict Cognitive Impairment in People with Parkinson's Disease.

NAACL2021 Zakaria Aldeneh, Matthew Perez, Emily Mower Provost
Learning Paralinguistic Features from Audiobooks through Style Voice Conversion.

Interspeech2020 Matthew Perez, Zakaria Aldeneh, Emily Mower Provost
Aphasic Speech Recognition Using a Mixture of Speech Intelligibility Experts.

Interspeech2020 Amrit Romana, John Bandon, Noelle Carlozzi, Angela Roberts, Emily Mower Provost
Classification of Manifest Huntington Disease Using Vowel Distortion Measures.

ICASSP2019 Mimansa Jaiswal, Zakaria Aldeneh, Cristian-Paul Bara, Yuanhang Luo, Mihai Burzo, Rada Mihalcea, Emily Mower Provost
Muse-ing on the Impact of Utterance Ordering on Crowdsourced Emotion Annotations.

ICASSP2019 Biqiao Zhang, Soheil Khorram, Emily Mower Provost
Exploiting Acoustic and Lexical Properties of Phonemes to Recognize Valence from Speech.

Interspeech2019 Zakaria Aldeneh, Mimansa Jaiswal, Michael Picheny, Melvin G. McInnis, Emily Mower Provost
Identifying Mood Episodes Using Dialogue Features from Clinical Interviews.

Interspeech2019 John Gideon, Heather T. Schatten, Melvin G. McInnis, Emily Mower Provost
Emotion Recognition from Natural Phone Conversations in Individuals with and without Recent Suicidal Ideation.

Interspeech2019 Katie Matton, Melvin G. McInnis, Emily Mower Provost
Into the Wild: Transitioning from Recognizing Mood in Clinical Interactions to Personal Conversations for Individuals with Bipolar Disorder.

AAAI2019 Biqiao Zhang, Yuqing Kong, Georg Essl, Emily Mower Provost
f-Similarity Preservation Loss for Soft Labels: A Demonstration on Cross-Corpus Speech Emotion Recognition.

SpeechComm2018 Duc Le, Keli Licata, Emily Mower Provost
Automatic quantitative analysis of spontaneous aphasic speech.

ICASSP2018 Zakaria Aldeneh, Dimitrios Dimitriadis, Emily Mower Provost
Improving End-of-Turn Detection in Spoken Dialogues by Detecting Speaker Intentions as a Secondary Task.

Interspeech2018 Soheil Khorram, Mimansa Jaiswal, John Gideon, Melvin G. McInnis, Emily Mower Provost
The PRIORI Emotion Dataset: Linking Mood to Emotion Detected In-the-Wild.

Interspeech2018 Matthew Perez, Wenyu Jin 0001, Duc Le, Noelle Carlozzi, Praveen Dayalu, Angela Roberts, Emily Mower Provost
Classification of Huntington Disease Using Acoustic and Lexical Features.

Interspeech2017 John Gideon, Soheil Khorram, Zakaria Aldeneh, Dimitrios Dimitriadis, Emily Mower Provost
Progressive Neural Networks for Transfer Learning in Emotion Recognition.

Interspeech2017 Soheil Khorram, Zakaria Aldeneh, Dimitrios Dimitriadis, Melvin G. McInnis, Emily Mower Provost
Capturing Long-Term Temporal Dependencies with Convolutional Networks for Continuous Emotion Recognition.

#139  | Ron J. Weiss | Google Scholar   DBLP
VenuesInterspeech: 12ICASSP: 10ICLR: 2TASLP: 2ICML: 1NeurIPS: 1
Years2021: 42020: 22019: 92018: 62017: 42016: 3
ISCA Sectionspeech synthesis: 3speech translation: 2far-field speech processing: 2non-autoregressive sequential modeling for speech processing: 1neural network training methods for asr: 1medical applications and visual asr: 1speech enhancement: 1far-field speech recognition: 1
IEEE Keywordspeech recognition: 7speech synthesis: 4speaker recognition: 3text analysis: 2decoding: 2recurrent neural nets: 2tacotron 2: 2text to speech: 2natural language processing: 2speech enhancement: 2iterative methods: 1self attention: 1vae: 1non autoregressive: 1autoregressive processes: 1computational complexity: 1neural tts: 1supervised learning: 1fine grained vae: 1regression analysis: 1hierarchical: 1gaussian processes: 1autoencoder: 1vector quantisation: 1acoustic unit discovery: 1unsupervised learning: 1speech representation learning: 1spelling correction: 1language model: 1attention models: 1sequence to sequence: 1variational autoencoder: 1adversarial training: 1text to speech synthesis: 1data augmentation: 1texture synthesis: 1style transfer: 1convolutional networks: 1backpropagation: 1voice conversion: 1image texture: 1ctc: 1deep neural networks: 1neural net architecture: 1wavenet: 1vocoders: 1waveform analysis: 1computational linguistics: 1asr: 1multilingual: 1encoder decoder: 1seq2seq: 1indian: 1noise robust speech recognition: 1microphones: 1array signal processing: 1beamforming: 1spatial filters: 1direction of arrival estimation: 1channel bank filters: 1filtering theory: 1acoustic convolution: 1
Most Publications2019: 182017: 142018: 122020: 112021: 8


ICASSP2021 Isaac Elias, Heiga Zen, Jonathan Shen, Yu Zhang 0033, Ye Jia, Ron J. Weiss, Yonghui Wu, 
Parallel Tacotron: Non-Autoregressive and Controllable TTS.

Interspeech2021 Nanxin Chen, Yu Zhang 0033, Heiga Zen, Ron J. Weiss, Mohammad Norouzi 0002, Najim Dehak, William Chan, 
WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis.

Interspeech2021 Peidong Wang, Tara N. Sainath, Ron J. Weiss
Multitask Training with Text Data for End-to-End Speech Recognition.

ICLR2021 Nanxin Chen, Yu Zhang 0033, Heiga Zen, Ron J. Weiss, Mohammad Norouzi 0002, William Chan, 
WaveGrad: Estimating Gradients for Waveform Generation.

ICASSP2020 Tara N. Sainath, Ruoming Pang, Ron J. Weiss, Yanzhang He, Chung-Cheng Chiu, Trevor Strohman, 
An Attention-Based Joint Acoustic and Text on-Device End-To-End Model.

ICASSP2020 Guangzhi Sun, Yu Zhang 0033, Ron J. Weiss, Yuan Cao 0007, Heiga Zen, Yonghui Wu, 
Fully-Hierarchical Fine-Grained Prosody Modeling For Interpretable Speech Synthesis.

TASLP2019 Jan Chorowski, Ron J. Weiss, Samy Bengio, Aäron van den Oord, 
Unsupervised Speech Representation Learning Using WaveNet Autoencoders.

ICASSP2019 Jinxi Guo, Tara N. Sainath, Ron J. Weiss
A Spelling Correction Model for End-to-end Speech Recognition.

ICASSP2019 Wei-Ning Hsu, Yu Zhang 0033, Ron J. Weiss, Yu-An Chung, Yuxuan Wang 0002, Yonghui Wu, James R. Glass, 
Disentangling Correlated Speaker and Noise for Speech Synthesis via Data Augmentation and Adversarial Factorization.

Interspeech2019 Fadi Biadsy, Ron J. Weiss, Pedro J. Moreno 0001, Dimitri Kanvesky, Ye Jia, 
Parrotron: An End-to-End Speech-to-Speech Conversion Model and its Applications to Hearing-Impaired Speech and Speech Separation.

Interspeech2019 Ye Jia, Ron J. Weiss, Fadi Biadsy, Wolfgang Macherey, Melvin Johnson, Zhifeng Chen, Yonghui Wu, 
Direct Speech-to-Speech Translation with a Sequence-to-Sequence Model.

Interspeech2019 Quan Wang, Hannah Muckenhirn, Kevin W. Wilson, Prashant Sridhar, Zelin Wu, John R. Hershey, Rif A. Saurous, Ron J. Weiss, Ye Jia, Ignacio Lopez-Moreno, 
VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking.

Interspeech2019 Heiga Zen, Viet Dang, Rob Clark, Yu Zhang 0033, Ron J. Weiss, Ye Jia, Zhifeng Chen, Yonghui Wu, 
LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech.

Interspeech2019 Yu Zhang 0033, Ron J. Weiss, Heiga Zen, Yonghui Wu, Zhifeng Chen, R. J. Skerry-Ryan, Ye Jia, Andrew Rosenberg, Bhuvana Ramabhadran, 
Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning.

ICLR2019 Wei-Ning Hsu, Yu Zhang 0033, Ron J. Weiss, Heiga Zen, Yonghui Wu, Yuxuan Wang 0002, Yuan Cao 0007, Ye Jia, Zhifeng Chen, Jonathan Shen, Patrick Nguyen, Ruoming Pang, 
Hierarchical Generative Modeling for Controllable Speech Synthesis.

ICASSP2018 Chung-Cheng Chiu, Tara N. Sainath, Yonghui Wu, Rohit Prabhavalkar, Patrick Nguyen, Zhifeng Chen, Anjuli Kannan, Ron J. Weiss, Kanishka Rao, Ekaterina Gonina, Navdeep Jaitly, Bo Li 0028, Jan Chorowski, Michiel Bacchiani, 
State-of-the-Art Speech Recognition with Sequence-to-Sequence Models.

ICASSP2018 Jan Chorowski, Ron J. Weiss, Rif A. Saurous, Samy Bengio, 
On Using Backpropagation for Speech Texture Generation and Voice Conversion.

ICASSP2018 Jonathan Shen, Ruoming Pang, Ron J. Weiss, Mike Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang 0033, Yuxuan Wang 0002, RJ-Skerrv Ryan, Rif A. Saurous, Yannis Agiomyrgiannakis, Yonghui Wu, 
Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions.

ICASSP2018 Shubham Toshniwal, Tara N. Sainath, Ron J. Weiss, Bo Li 0028, Pedro J. Moreno 0001, Eugene Weinstein, Kanishka Rao, 
Multilingual Speech Recognition with a Single End-to-End Model.

ICML2018 R. J. Skerry-Ryan, Eric Battenberg, Ying Xiao, Yuxuan Wang 0002, Daisy Stanton, Joel Shor, Ron J. Weiss, Rob Clark, Rif A. Saurous, 
Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron.

#140  | Xiong Xiao | Google Scholar   DBLP
VenuesInterspeech: 14ICASSP: 12SpeechComm: 1TASLP: 1
Years2022: 42021: 22020: 32019: 12018: 32017: 22016: 13
ISCA Sectionspoken term detection: 2other topics in speech recognition: 1robust asr, and far-field/multi-talker asr: 1source separation: 1speaker diarization: 1distant asr: 1speaker recognition evaluation: 1source separation and voice activity detection: 1language recognition: 1speaker diarization and recognition: 1robust speaker recognition and anti-spoofing: 1automatic learning of representations: 1resources and annotation of resources: 1
IEEE Keywordspeaker recognition: 7speech recognition: 7audio signal processing: 4speaker diarization: 3microphone arrays: 3meeting transcription: 2continuous speech separation: 2pattern clustering: 2speaker counting: 1voice activity detection: 1rich transcription: 1probability: 1diarisation: 1sound source localisation: 1hidden markov model: 1hidden markov models: 1speaker location: 1filtering theory: 1source separation: 1speech separation: 1system fusion: 1libricss: 1microphones: 1overlapped speech: 1automatic speech recognition: 1permutation invariant training: 1deep speaker embedding: 1matrix algebra: 1graph theory: 1graph neural networks: 1array signal processing: 1speaker independent speech separation: 1far field: 1acoustic model: 1spotting: 1data compression: 1teacher student learning: 1feature adaptation: 1linear transform: 1temporal filtering: 1robust speech recognition: 1transforms: 1estimation theory: 1keyword spotting: 1deep neural network (dnn): 1large vocabulary continuous speech recognition (lvcsr): 1under resourced languages: 1spoken term detection (std): 1automatic speech recognition (asr): 1recurrent neural nets: 1signal representation: 1i vector: 1lstm rnns: 1speaker adaptation: 1speaking rate: 1speaker aware training: 1feedforward neural nets: 1phase: 1spoofing attack: 1high dimensional feature: 1counter measure: 1spoofing detection: 1error analysis: 1direction of arrival: 1mean square error methods: 1time frequency analysis: 1eigenvector clustering: 1spatial covariance: 1mixture models: 1eigenvalues and eigenfunctions: 1direction of arrival estimation: 1covariance matrices: 1expectation maximisation algorithm: 1expectation maximization: 1query processing: 1spoken term detection: 1data augmentation: 1time series: 1dtw: 1partial matching: 1query by example: 1reverberation: 1
Most Publications2016: 282022: 172015: 172018: 142021: 12

Affiliations
URLs

ICASSP2022 Naoyuki Kanda, Xiong Xiao, Yashesh Gaur, Xiaofei Wang 0009, Zhong Meng, Zhuo Chen 0006, Takuya Yoshioka, 
Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers Using End-to-End Speaker-Attributed ASR.

Interspeech2022 Naoyuki Kanda, Jian Wu 0027, Yu Wu 0012, Xiong Xiao, Zhong Meng, Xiaofei Wang 0009, Yashesh Gaur, Zhuo Chen 0006, Jinyu Li 0001, Takuya Yoshioka, 
Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings.

Interspeech2022 Naoyuki Kanda, Jian Wu 0027, Yu Wu 0012, Xiong Xiao, Zhong Meng, Xiaofei Wang 0009, Yashesh Gaur, Zhuo Chen 0006, Jinyu Li 0001, Takuya Yoshioka, 
Streaming Multi-Talker ASR with Token-Level Serialized Output Training.

Interspeech2022 Wangyou Zhang, Zhuo Chen 0006, Naoyuki Kanda, Shujie Liu 0001, Jinyu Li 0001, Sefik Emre Eskimez, Takuya Yoshioka, Xiong Xiao, Zhong Meng, Yanmin Qian, Furu Wei, 
Separating Long-Form Speech with Group-wise Permutation Invariant Training.

ICASSP2021 Jeremy H. M. Wong, Xiong Xiao, Yifan Gong 0001, 
Hidden Markov Model Diarisation with Speaker Location Information.

ICASSP2021 Xiong Xiao, Naoyuki Kanda, Zhuo Chen 0006, Tianyan Zhou, Takuya Yoshioka, Sanyuan Chen, Yong Zhao 0008, Gang Liu, Yu Wu 0012, Jian Wu 0027, Shujie Liu 0001, Jinyu Li 0001, Yifan Gong 0001, 
Microsoft Speaker Diarization System for the Voxceleb Speaker Recognition Challenge 2020.

ICASSP2020 Zhuo Chen 0006, Takuya Yoshioka, Liang Lu 0001, Tianyan Zhou, Zhong Meng, Yi Luo 0004, Jian Wu 0027, Xiong Xiao, Jinyu Li 0001, 
Continuous Speech Separation: Dataset and Analysis.

ICASSP2020 Jixuan Wang, Xiong Xiao, Jian Wu 0027, Ranjani Ramamurthy, Frank Rudzicz, Michael Brudno, 
Speaker Diarization with Session-Level Speaker Embedding Refinement Using Graph Neural Networks.

Interspeech2020 Jixuan Wang, Xiong Xiao, Jian Wu 0027, Ranjani Ramamurthy, Frank Rudzicz, Michael Brudno, 
Speaker Attribution with Voice Profiles by Graph-Based Semi-Supervised Learning.

ICASSP2019 Takuya Yoshioka, Zhuo Chen 0006, Changliang Liu, Xiong Xiao, Hakan Erdogan, Dimitrios Dimitriadis, 
Low-latency Speaker-independent Continuous Speech Separation.

SpeechComm2018 Van Tung Pham, Haihua Xu, Xiong Xiao, Nancy F. Chen, Eng Siong Chng, Haizhou Li 0001, 
Re-ranking spoken term detection with acoustic exemplars of keywords.

ICASSP2018 Jinyu Li 0001, Rui Zhao 0017, Zhuo Chen 0006, Changliang Liu, Xiong Xiao, Guoli Ye, Yifan Gong 0001, 
Developing Far-Field Speaker System Via Teacher-Student Learning.

Interspeech2018 Takuya Yoshioka, Hakan Erdogan, Zhuo Chen 0006, Xiong Xiao, Fil Alleva, 
Recognizing Overlapped Speech in Meetings: A Multichannel Separation Approach Using Neural Networks.

Interspeech2017 Kong-Aik Lee, Ville Hautamäki, Tomi Kinnunen, Anthony Larcher, Chunlei Zhang, Andreas Nautsch, Themos Stafylakis, Gang Liu 0001, Mickaël Rouvier, Wei Rao, Federico Alegre, J. Ma, Man-Wai Mak, Achintya Kumar Sarkar, Héctor Delgado, Rahim Saeidi, Hagai Aronowitz, Aleksandr Sizov, Hanwu Sun, Trung Hieu Nguyen 0001, G. Wang, Bin Ma 0001, Ville Vestman, Md. Sahidullah, M. Halonen, Anssi Kanervisto, Gaël Le Lan, Fahimeh Bahmaninezhad, Sergey Isadskiy, Christian Rathgeb, Christoph Busch 0001, Georgios Tzimiropoulos, Q. Qian, Z. Wang, Q. Zhao, T. Wang, H. Li, J. Xue, S. Zhu, R. Jin, T. Zhao, Pierre-Michel Bousquet, Moez Ajili, Waad Ben Kheder, Driss Matrouf, Zhi Hao Lim, Chenglin Xu, Haihua Xu, Xiong Xiao, Eng Siong Chng, Benoit G. B. Fauve, Kaavya Sriskandaraja, Vidhyasaharan Sethu, W. W. Lin, Dennis Alexander Lehmann Thomsen, Zheng-Hua Tan, Massimiliano Todisco, Nicholas W. D. Evans, Haizhou Li 0001, John H. L. Hansen, Jean-François Bonastre, Eliathamby Ambikairajah, 
The I4U Mega Fusion and Collaboration for NIST Speaker Recognition Evaluation 2016.

Interspeech2017 Chenglin Xu, Xiong Xiao, Sining Sun, Wei Rao, Eng Siong Chng, Haizhou Li 0001, 
Weighted Spatial Covariance Matrix Estimation for MUSIC Based TDOA Estimation of Speech Source.

TASLP2016 Duc Hoang Ha Nguyen, Xiong Xiao, Eng Siong Chng, Haizhou Li 0001, 
Feature Adaptation Using Linear Spectro-Temporal Transform for Robust Speech Recognition.

ICASSP2016 Nancy F. Chen, Van Tung Pham, Haihua Xu, Xiong Xiao, Van Hai Do, Chongjia Ni, I-Fan Chen, Sunil Sivadas, Chin-Hui Lee, Eng Siong Chng, Bin Ma 0001, Haizhou Li 0001, 
Exemplar-inspired strategies for low-resource spoken keyword search in Swahili.

ICASSP2016 Tian Tan 0002, Yanmin Qian, Dong Yu 0001, Souvik Kundu 0003, Liang Lu 0001, Khe Chai Sim, Xiong Xiao, Yu Zhang 0033, 
Speaker-aware training of LSTM-RNNS for acoustic modelling.

ICASSP2016 Xiaohai Tian, Zhizheng Wu 0001, Xiong Xiao, Eng Siong Chng, Haizhou Li 0001, 
Spoofing detection from a feature representation perspective.

ICASSP2016 Xiong Xiao, Shengkui Zhao, Thi Ngoc Tho Nguyen, Douglas L. Jones, Eng Siong Chng, Haizhou Li 0001, 
An expectation-maximization eigenvector clustering approach to direction of arrival estimation of multiple speech sources.

#141  | Heidi Christensen | Google Scholar   DBLP
VenuesInterspeech: 19ICASSP: 7TASLP: 1SpeechComm: 1
Years2022: 52021: 62020: 82019: 32018: 12017: 32016: 2
ISCA Sectionspeech and language in health: 2assessment of pathological speech and language: 2speech and voice disorders: 2speech in health: 2special session: 2summarization, entity extraction, evaluation and others: 1technology for disordered speech: 1survey talk: 1the adresso challenge: 1alzheimer’s dementia recognition through spontaneous speech: 1voice and hearing disorders: 1medical applications and visual asr: 1integrating speech science and technology for clinical applications: 1disorders related to speech and language: 1
IEEE Keywordspeech recognition: 4handicapped aids: 3cognition: 2diseases: 2multi stream acoustic modelling: 1filtering theory: 1speech coding: 1dysarthric automatic speech recognition: 1source filter separation and fusion: 1regression analysis: 1multi task learning: 1sincnet: 1cognitive decline estimation: 1support vector machines: 1x vector: 1age estimation: 1behavioural sciences computing: 1probability: 1dysarthric speech recognition: 1transfer learning: 1entropy: 1gaussian distribution: 1data selection: 1speaker recognition: 1posterior probability: 1vocabulary: 1natural language processing: 1spectral analysis: 1language modelling: 1out of domain data: 1continuous dysarthric speech recognition: 1software agents: 1pattern classification: 1brain: 1virtual reality: 1automatic speech recognition: 1clinical applications of speech technology: 1medical disorders: 1patient diagnosis: 1speaker diarisation: 1geriatrics: 1medical diagnostic computing: 1neurophysiology: 1phonetics: 1personalised speech recognition: 1gaussian processes: 1speech tempo: 1dysarthria: 1data augmentation: 1hidden markov models: 1mixture models: 1time domain analysis: 1phonological features: 1speech synthesis: 1non modal phonation: 1phonological vocoding: 1medical signal processing: 1parkinson's disease: 1gcca: 1multi view learning: 1handwriting processing: 1frenchay dysarthria assessment: 1updrs: 1gait processing: 1patient treatment: 1
Most Publications2020: 92021: 82022: 72017: 72015: 7

Affiliations
URLs

TASLP2022 Zhengjun Yue, Erfan Loweimi, Heidi Christensen, Jon Barker, Zoran Cvetkovic, 
Acoustic Modelling From Raw Source and Filter Components for Dysarthric Speech Recognition.

Interspeech2022 Samuel Hollands, Daniel Blackburn, Heidi Christensen
Evaluating the Performance of State-of-the-Art ASR Systems on Non-Native English using Corpora with Extensive Language Background Variation.

Interspeech2022 Bahman Mirheidari, Daniel Blackburn, Heidi Christensen
Automatic cognitive assessment: Combining sparse datasets with disparate cognitive scores.

Interspeech2022 Bahman Mirheidari, André Bittar, Nicholas Cummins, Johnny Downs, Helen L. Fisher, Heidi Christensen
Automatic Detection of Expressed Emotion from Five-Minute Speech Samples: Challenges and Opportunities.

Interspeech2022 Zhengjun Yue, Erfan Loweimi, Heidi Christensen, Jon Barker, Zoran Cvetkovic, 
Dysarthric Speech Recognition From Raw Waveform with Parametric CNNs.

SpeechComm2021 Lubna Alhinti, Heidi Christensen, Stuart P. Cunningham, 
Acoustic differences in emotional speech of people with dysarthria.

ICASSP2021 Yilin Pan, Venkata Srikanth Nallanthighal, Daniel Blackburn, Heidi Christensen, Aki Härmä, 
Multi-Task Estimation of Age and Cognitive Decline from Speech.

Interspeech2021 Heidi Christensen
Towards Automatic Speech Recognition for People with Atypical Speech.

Interspeech2021 Bahman Mirheidari, Yilin Pan, Daniel Blackburn, Ronan O'Malley, Heidi Christensen
Identifying Cognitive Impairment Using Sentence Representation Vectors.

Interspeech2021 Yilin Pan, Bahman Mirheidari, Jennifer M. Harris, Jennifer C. Thompson, Matthew Jones, Julie S. Snowden, Daniel Blackburn, Heidi Christensen
Using the Outputs of Different Automatic Speech Recognition Paradigms for Acoustic- and BERT-Based Alzheimer's Dementia Detection Through Spontaneous Speech.

Interspeech2021 Zhengjun Yue, Jon Barker, Heidi Christensen, Cristina McKean, Elaine Ashton, Yvonne Wren, Swapnil Gadgil, Rebecca Bright, 
Parental Spoken Scaffolding and Narrative Skills in Crowd-Sourced Storytelling Samples of Young Children.

ICASSP2020 Feifei Xiong, Jon Barker, Zhengjun Yue, Heidi Christensen
Source Domain Data Selection for Improved Transfer Learning Targeting Dysarthric Speech Recognition.

ICASSP2020 Zhengjun Yue, Feifei Xiong, Heidi Christensen, Jon Barker, 
Exploring Appropriate Acoustic and Language Modelling Choices for Continuous Dysarthric Speech Recognition.

Interspeech2020 Lubna Alhinti, Stuart P. Cunningham, Heidi Christensen
Recognising Emotions in Dysarthric Speech Using Typical Speech Data.

Interspeech2020 Nicholas Cummins, Yilin Pan, Zhao Ren, Julian Fritsch, Venkata Srikanth Nallanthighal, Heidi Christensen, Daniel Blackburn, Björn W. Schuller, Mathew Magimai-Doss, Helmer Strik, Aki Härmä, 
A Comparison of Acoustic and Linguistics Methodologies for Alzheimer's Dementia Recognition.

Interspeech2020 Bahman Mirheidari, Daniel Blackburn, Ronan O'Malley, Annalena Venneri, Traci Walker, Markus Reuber, Heidi Christensen
Improving Cognitive Impairment Classification by Generative Neural Network-Based Feature Augmentation.

Interspeech2020 Yilin Pan, Bahman Mirheidari, Markus Reuber, Annalena Venneri, Daniel Blackburn, Heidi Christensen
Improving Detection of Alzheimer's Disease Using Automatic Speech Recognition to Identify High-Quality Segments for More Robust Feature Extraction.

Interspeech2020 Yilin Pan, Bahman Mirheidari, Zehai Tu, Ronan O'Malley, Traci Walker, Annalena Venneri, Markus Reuber, Daniel Blackburn, Heidi Christensen
Acoustic Feature Extraction with Interpretable Deep Neural Network for Neurodegenerative Related Disorder Classification.

Interspeech2020 Zhengjun Yue, Heidi Christensen, Jon Barker, 
Autoencoder Bottleneck Features with Multi-Task Optimisation for Improved Continuous Dysarthric Speech Recognition.

ICASSP2019 Bahman Mirheidari, Daniel Blackburn, Ronan O'Malley, Traci Walker, Annalena Venneri, Markus Reuber, Heidi Christensen
Computational Cognitive Assessment: Investigating the Use of an Intelligent Virtual Agent for the Detection of Early Signs of Dementia.

#142  | John R. Hershey | Google Scholar   DBLP
VenuesICASSP: 15Interspeech: 9ACL: 2ICLR: 1ICML: 1
Years2022: 32021: 32020: 12019: 42018: 72017: 42016: 6
ISCA Sectiondereverberation, noise reduction, and speaker extraction: 1spatial audio: 1source separation: 1speech technologies for code-switching in multilingual communities: 1speech enhancement: 1spatial and phase cues for source separation and speech recognition: 1far-field speech processing: 1spoken language understanding systems: 1source separation and spatial audio: 1
IEEE Keywordspeech recognition: 6source separation: 5audio signal processing: 5speech enhancement: 4deep clustering: 4speaker recognition: 3pattern clustering: 3cocktail party problem: 3audio recording: 2entropy: 2end to end asr: 2speaker independent multi talker speech separation: 2semi supervised learning (artificial intelligence): 1mixture invariant training: 1real world audio processing: 1unsupervised learning: 1protocols: 1diarization: 1attention: 1loudspeakers: 1blind source separation: 1signal classification: 1sound classification: 1semantic audio representations: 1audio source separation: 1signal to noise ratio: 1signal denoising: 1objective measure: 1pattern recognition: 1time domain analysis: 1fourier transforms: 1multichannel end to end asr: 1speaker adaptation: 1hidden markov models: 1attention based encoder decoder: 1neural beamformer: 1computational linguistics: 1neural net architecture: 1hybrid attention/ctc: 1natural language processing: 1language identification: 1language independent architecture: 1multilingual asr: 1human computer interaction: 1spatial clustering: 1speaker independent speech separation: 1microphone arrays: 1iterative methods: 1chimera network: 1estimation theory: 1music separation: 1approximation theory: 1music: 1singing voice separation: 1chime 4: 1student teacher learning: 1distant talking asr: 1self supervised learning: 1multi access systems: 1distance learning: 1optimisation: 1embedding: 1clustering: 1speech separation: 1probability: 1recurrent neural nets: 1maximum likelihood estimation: 1long short term memory: 1recurrent neural network language model: 1minimum word error training: 1multichannel gmm: 1gaussian processes: 1deep unfolding: 1markov random field: 1mixture models: 1markov processes: 1
Most Publications2017: 222021: 162018: 152016: 142019: 11

Affiliations
Mitsubishi Electric Research Laboratories (MERL), Cambridge, USA
IBM T. J. Watson Research Center, New York, USA
University of California San Diego, Department of Cognitive Science

ICASSP2022 Aswin Sivaraman, Scott Wisdom, Hakan Erdogan, John R. Hershey
Adapting Speech Separation to Real-World Meetings using Mixture Invariant Training.

Interspeech2022 Hannah Muckenhirn, Aleksandr Safin, Hakan Erdogan, Felix de Chaumont Quitry, Marco Tagliasacchi, Scott Wisdom, John R. Hershey
CycleGAN-based Unpaired Speech Dereverberation.

Interspeech2022 Katharine Patterson, Kevin W. Wilson, Scott Wisdom, John R. Hershey
Distance-Based Sound Separation.

ICASSP2021 Soumi Maiti, Hakan Erdogan, Kevin W. Wilson, Scott Wisdom, Shinji Watanabe 0001, John R. Hershey
End-To-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker Embeddings.

Interspeech2021 Cong Han, Yi Luo 0004, Chenda Li, Tianyan Zhou, Keisuke Kinoshita, Shinji Watanabe 0001, Marc Delcroix, Hakan Erdogan, John R. Hershey, Nima Mesgarani, Zhuo Chen 0006, 
Continuous Speech Separation Using Speaker Inventory for Long Recording.

ICLR2021 Efthymios Tzinis, Scott Wisdom, Aren Jansen, Shawn Hershey, Tal Remez, Dan Ellis, John R. Hershey
Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen Sounds.

ICASSP2020 Efthymios Tzinis, Scott Wisdom, John R. Hershey, Aren Jansen, Daniel P. W. Ellis, 
Improving Universal Sound Separation Using Sound Classification.

ICASSP2019 Jonathan Le Roux, Scott Wisdom, Hakan Erdogan, John R. Hershey
SDR - Half-baked or Well Done?

ICASSP2019 Scott Wisdom, John R. Hershey, Kevin W. Wilson, Jeremy Thorpe, Michael Chinen, Brian Patton, Rif A. Saurous, 
Differentiable Consistency Constraints for Improved Deep Speech Enhancement.

Interspeech2019 Hiroshi Seki, Takaaki Hori, Shinji Watanabe 0001, Jonathan Le Roux, John R. Hershey
End-to-End Multilingual Multi-Speaker Speech Recognition.

Interspeech2019 Quan Wang, Hannah Muckenhirn, Kevin W. Wilson, Prashant Sridhar, Zelin Wu, John R. Hershey, Rif A. Saurous, Ron J. Weiss, Ye Jia, Ignacio Lopez-Moreno, 
VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking.

ICASSP2018 Tsubasa Ochiai, Shinji Watanabe 0001, Shigeru Katagiri, Takaaki Hori, John R. Hershey
Speaker Adaptation for Multichannel End-to-End Speech Recognition.

ICASSP2018 Hiroshi Seki, Shinji Watanabe 0001, Takaaki Hori, Jonathan Le Roux, John R. Hershey
An End-to-End Language-Tracking Speech Recognizer for Mixed-Language Speech.

ICASSP2018 Shane Settle, Jonathan Le Roux, Takaaki Hori, Shinji Watanabe 0001, John R. Hershey
End-to-End Multi-Speaker Speech Recognition.

ICASSP2018 Zhong-Qiu Wang, Jonathan Le Roux, John R. Hershey
Multi-Channel Deep Clustering: Discriminative Spectral and Spatial Embeddings for Speaker-Independent Speech Separation.

ICASSP2018 Zhong-Qiu Wang, Jonathan Le Roux, John R. Hershey
Alternative Objective Functions for Deep Clustering.

Interspeech2018 Zhong-Qiu Wang, Jonathan Le Roux, DeLiang Wang, John R. Hershey
End-to-End Speech Separation with Unfolded Iterative Phase Reconstruction.

ACL2018 Hiroshi Seki, Takaaki Hori, Shinji Watanabe 0001, Jonathan Le Roux, John R. Hershey
A Purely End-to-End System for Multi-speaker Speech Recognition.

ICASSP2017 Yi Luo 0004, Zhuo Chen 0006, John R. Hershey, Jonathan Le Roux, Nima Mesgarani, 
Deep clustering and conventional networks for music separation: Stronger together.

ICASSP2017 Shinji Watanabe 0001, Takaaki Hori, Jonathan Le Roux, John R. Hershey
Student-teacher network learning with enhanced features.

#143  | Xu Tan 0003 | Google Scholar   DBLP
VenuesICASSP: 8Interspeech: 8NeurIPS: 3ACL: 2ICLR: 2KDD: 2AAAI: 1TASLP: 1ICML: 1
Years2022: 72021: 122020: 52019: 4
ISCA Sectionspeech synthesis: 5voice conversion and adaptation: 1multi- and cross-lingual asr, other topics in asr: 1singing and multimodal synthesis: 1
IEEE Keywordtext to speech: 5speech synthesis: 5medical image processing: 2text analysis: 2speaker recognition: 2speech intelligibility: 2speech recognition: 2iterative methods: 1optimisation: 1probability: 1fast sampling: 1image denoising: 1vocoder: 1denoising diffusion probabilistic models: 1vocoders: 1transformer: 1phonetic posteriorgrams: 1speech to animation: 1mixture of experts: 1computer animation: 1pre training: 1data reduction: 1mos prediction: 1mean bias network: 1sensitivity analysis: 1video signal processing: 1correlation methods: 1speech quality assessment: 1lightweight: 1fast: 1search problems: 1autoregressive processes: 1neural architecture search: 1mixup: 1low resource: 1data augmentation: 1untranscribed data: 1adaptation: 1signal reconstruction: 1noisy speech: 1denoise: 1speech enhancement: 1frame level condition: 1signal denoising: 1error propagation: 1language characteristic: 1sequence generation: 1natural language processing: 1accuracy drop: 1
Most Publications2022: 412021: 412020: 322019: 292023: 18

Affiliations
Microsoft Research Asia, Beijing, China
URLs

ICASSP2022 Zehua Chen, Xu Tan 0003, Ke Wang, Shifeng Pan, Danilo P. Mandic, Lei He 0005, Sheng Zhao, 
Infergrad: Improving Diffusion Models for Vocoder by Considering Inference in Training.

ICASSP2022 Liyang Chen, Zhiyong Wu 0001, Jun Ling, Runnan Li, Xu Tan 0003, Sheng Zhao, 
Transformer-S2A: Robust and Efficient Speech-to-Animation.

ICASSP2022 Guangyan Zhang, Yichong Leng, Daxin Tan, Ying Qin, Kaitao Song, Xu Tan 0003, Sheng Zhao, Tan Lee, 
A Study on the Efficacy of Model Pre-Training In Developing Neural Text-to-Speech System.

Interspeech2022 Yanqing Liu, Ruiqing Xue, Lei He 0005, Xu Tan 0003, Sheng Zhao, 
DelightfulTTS 2: End-to-End Speech Synthesis with Adversarial Vector-Quantized Auto-Encoders.

Interspeech2022 Yihan Wu, Xu Tan 0003, Bohan Li 0003, Lei He 0005, Sheng Zhao, Ruihua Song, Tao Qin, Tie-Yan Liu, 
AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios.

Interspeech2022 Guangyan Zhang, Kaitao Song, Xu Tan 0003, Daxin Tan, Yuzi Yan, Yanqing Liu, Gang Wang, Wei Zhou, Tao Qin, Tan Lee, Sheng Zhao, 
Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech.

ACL2022 Yi Ren 0006, Xu Tan 0003, Tao Qin, Zhou Zhao, Tie-Yan Liu, 
Revisiting Over-Smoothness in Text to Speech.

ICASSP2021 Yichong Leng, Xu Tan 0003, Sheng Zhao, Frank K. Soong, Xiang-Yang Li 0001, Tao Qin, 
MBNET: MOS Prediction for Synthesized Speech with Mean-Bias Network.

ICASSP2021 Renqian Luo, Xu Tan 0003, Rui Wang, Tao Qin, Jinzhu Li, Sheng Zhao, Enhong Chen, Tie-Yan Liu, 
Lightspeech: Lightweight and Fast Text to Speech with Neural Architecture Search.

ICASSP2021 Linghui Meng 0001, Jin Xu 0010, Xu Tan 0003, Jindong Wang 0001, Tao Qin, Bo Xu 0002, 
MixSpeech: Data Augmentation for Low-Resource Automatic Speech Recognition.

ICASSP2021 Yuzi Yan, Xu Tan 0003, Bohan Li 0003, Tao Qin, Sheng Zhao, Yuan Shen 0001, Tie-Yan Liu, 
Adaspeech 2: Adaptive Text to Speech with Untranscribed Data.

ICASSP2021 Chen Zhang 0020, Yi Ren 0006, Xu Tan 0003, Jinglin Liu, Kejun Zhang, Tao Qin, Sheng Zhao, Tie-Yan Liu, 
Denoispeech: Denoising Text to Speech with Frame-Level Noise Modeling.

Interspeech2021 Wenxin Hou, Jindong Wang 0001, Xu Tan 0003, Tao Qin, Takahiro Shinozaki, 
Cross-Domain Speech Recognition with Unsupervised Character-Level Distribution Matching.

Interspeech2021 Yuzi Yan, Xu Tan 0003, Bohan Li 0003, Guangyan Zhang, Tao Qin, Sheng Zhao, Yuan Shen 0001, Wei-Qiang Zhang, Tie-Yan Liu, 
Adaptive Text to Speech for Spontaneous Style.

NeurIPS2021 Jiawei Chen 0008, Xu Tan 0003, Yichong Leng, Jin Xu 0010, Guihua Wen, Tao Qin, Tie-Yan Liu, 
Speech-T: Transducer for Text to Speech and Beyond.

NeurIPS2021 Yichong Leng, Xu Tan 0003, Linchen Zhu, Jin Xu 0010, Renqian Luo, Linquan Liu, Tao Qin, Xiangyang Li 0001, Edward Lin, Tie-Yan Liu, 
FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition.

ICLR2021 Yi Ren 0006, Chenxu Hu, Xu Tan 0003, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu, 
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.

ICLR2021 Mingjian Chen, Xu Tan 0003, Bohan Li 0003, Yanqing Liu, Tao Qin, Sheng Zhao, Tie-Yan Liu, 
AdaSpeech: Adaptive Text to Speech for Custom Voice.

AAAI2021 Chen Zhang 0020, Xu Tan 0003, Yi Ren 0006, Tao Qin, Kejun Zhang, Tie-Yan Liu, 
UWSpeech: Speech to Speech Translation for Unwritten Languages.

Interspeech2020 Mingjian Chen, Xu Tan 0003, Yi Ren 0006, Jin Xu 0010, Hao Sun, Sheng Zhao, Tao Qin, 
MultiSpeech: Multi-Speaker Text to Speech with Transformer.

#144  | Bo Xu 0002 | Google Scholar   DBLP
VenuesInterspeech: 14ICASSP: 11AAAI: 2IJCAI: 1
Years2022: 32021: 72020: 32019: 22018: 62017: 22016: 5
ISCA Sectionspeech synthesis: 2sequence models for asr: 2language recognition: 2language and accent recognition: 1source separation, dereverberation and echo cancellation: 1single-channel speech enhancement: 1targeted source separation: 1speaker verification using neural network methods: 1dereverberation: 1multi-lingual models and adaptation for asr: 1acoustic modeling with neural networks: 1
IEEE Keywordspeech recognition: 7natural language processing: 4interactive systems: 2hearing: 2cocktail party problem: 2speech synthesis: 2statistical parametric speech synthesis: 2information retrieval: 1human computer interaction: 1deep learning (artificial intelligence): 1retrieval based dialogue systems: 1semantic networks: 1external knowledge: 1response selection: 1deep matching: 1multi agent systems: 1visual dialog: 1document handling: 1data visualisation: 1contrastive learning: 1text analysis: 1cross modal understanding: 1question answering (information retrieval): 1contextual biasing: 1collaborative decoding: 1contextual speech recognition: 1knowledge selection: 1emotion recognition: 1voiceprint: 1face recognition: 1onset cue: 1voice activity detection: 1onset/offset cues: 1speaker extraction: 1speaker recognition: 1signal representation: 1anechoic chambers (acoustic): 1speaker and direction inferred separation: 1dual channel speech separation: 1source separation: 1mixup: 1low resource: 1data augmentation: 1soft and monotonic alignment: 1decoding: 1continuous integrate and fire: 1acoustic boundary positioning: 1end to end model: 1online speech recognition: 1recurrent neural nets: 1encoder decoder: 1end to end: 1self attention network: 1latency control: 1feedforward neural nets: 1transformer: 1sequence to sequence: 1attention: 1lstm: 1trajectory smoother: 1convolutional output layer: 1high performance: 1mean square error methods: 1gating recurrent mixture density network: 1gru: 1gating units: 1
Most Publications2018: 442021: 302014: 302022: 292016: 27

Affiliations
University of Science and Technology of China, Department of Automation, Hefei, China
Chinese Academy of Sciences, Center for Excellence in Brain Science and Intelligence Technology, Beijing, China
Chinese Academy of Sciences, Institute of Automation, National Laboratory of Pattern Recognition, Beijing, China

ICASSP2022 Xiuyi Chen, Feilong Chen, Shuang Xu, Bo Xu 0002
A Multi Domain Knowledge Enhanced Matching Network for Response Selection in Retrieval-Based Dialogue Systems.

ICASSP2022 Feilong Chen, Xiuyi Chen, Shuang Xu, Bo Xu 0002
Improving Cross-Modal Understanding in Visual Dialog Via Contrastive Learning.

ICASSP2022 Minglun Han, Linhao Dong, Zhenlin Liang, Meng Cai, Shiyu Zhou, Zejun Ma, Bo Xu 0002
Improving End-to-End Contextual Speech Recognition with Fine-Grained Contextual Knowledge Selection.

ICASSP2021 Yunzhe Hao, Jiaming Xu 0001, Peng Zhang, Bo Xu 0002
Wase: Learning When to Attend for Speaker Extraction in Cocktail Party Environments.

ICASSP2021 Chenxing Li, Jiaming Xu 0001, Nima Mesgarani, Bo Xu 0002
Speaker and Direction Inferred Dual-Channel Speech Separation.

ICASSP2021 Linghui Meng 0001, Jin Xu 0010, Xu Tan 0003, Jindong Wang 0001, Tao Qin, Bo Xu 0002
MixSpeech: Data Augmentation for Low-Resource Automatic Speech Recognition.

Interspeech2021 Zhiyun Fan, Meng Li, Shiyu Zhou, Bo Xu 0002
Exploring wav2vec 2.0 on Speaker Verification and Language Identification.

Interspeech2021 Xiyun Li, Yong Xu 0004, Meng Yu 0003, Shi-Xiong Zhang, Jiaming Xu 0001, Bo Xu 0002, Dong Yu 0001, 
MIMO Self-Attentive RNN Beamformer for Multi-Speaker Speech Separation.

AAAI2021 Qianqian Dong, Mingxuan Wang, Hao Zhou 0012, Shuang Xu, Bo Xu 0002, Lei Li 0005, 
Consecutive Decoding for Speech-to-text Translation.

AAAI2021 Qianqian Dong, Rong Ye, Mingxuan Wang, Hao Zhou 0012, Shuang Xu, Bo Xu 0002, Lei Li 0005, 
Listen, Understand and Translate: Triple Supervision Decouples End-to-end Speech-to-text Translation.

ICASSP2020 Linhao Dong, Bo Xu 0002
CIF: Continuous Integrate-And-Fire for End-To-End Speech Recognition.

Interspeech2020 Jing Shi 0003, Jiaming Xu 0001, Yusuke Fujita, Shinji Watanabe 0001, Bo Xu 0002
Speaker-Conditional Chain Model for Speech Separation and Extraction.

Interspeech2020 Yunzhe Hao, Jiaming Xu 0001, Jing Shi 0003, Peng Zhang, Lei Qin, Bo Xu 0002
A Unified Framework for Low-Latency Speaker Extraction in Cocktail Party Environments.

ICASSP2019 Linhao Dong, Feng Wang 0023, Bo Xu 0002
Self-attention Aligner: A Latency-control End-to-end Model for ASR Using Self-attention Network and Chunk-hopping.

Interspeech2019 Yuxiang Zou, Linhao Dong, Bo Xu 0002
Boosting Character-Based Chinese Speech Synthesis via Multi-Task Learning and Dictionary Tutoring.

ICASSP2018 Linhao Dong, Shuang Xu, Bo Xu 0002
Speech-Transformer: A No-Recurrence Sequence-to-Sequence Model for Speech Recognition.

Interspeech2018 Linhao Dong, Shiyu Zhou, Wei Chen 0048, Bo Xu 0002
Extending Recurrent Neural Aligner for Streaming End-to-End Speech Recognition in Mandarin.

Interspeech2018 Ruifang Ji, Xinyuan Cai, Bo Xu 0002
An End-to-End Text-Independent Speaker Identification System on Short Utterances.

Interspeech2018 Chenxing Li, Tieqiang Wang, Shuang Xu, Bo Xu 0002
Single-channel Speech Dereverberation via Generative Adversarial Training.

Interspeech2018 Shiyu Zhou, Linhao Dong, Shuang Xu, Bo Xu 0002
Syllable-Based Sequence-to-Sequence Speech Recognition with the Transformer in Mandarin Chinese.

#145  | Tsubasa Ochiai | Google Scholar   DBLP
VenuesInterspeech: 14ICASSP: 13ICML: 1
Years2022: 62021: 82020: 52019: 42018: 22017: 22016: 1
ISCA Sectiondereverberation, noise reduction, and speaker extraction: 1speech enhancement and intelligibility: 1search/decoding algorithms for asr: 1novel models and training methods for asr: 1single-channel speech enhancement: 1source separation: 1streaming for asr/rnn transducers: 1source separation, dereverberation and echo cancellation: 1speech localization, enhancement, and quality assessment: 1asr neural network architectures and training: 1targeted source separation: 1asr for noisy and far-field speech: 1speech and audio source separation and scene analysis: 1recurrent neural models for asr: 1
IEEE Keywordspeech recognition: 12speech enhancement: 7speaker recognition: 5neural network: 4speech extraction: 2blind source separation: 2array signal processing: 2source separation: 2reverberation: 2recurrent neural nets: 2dynamic stream weights: 2target speech extraction: 2time domain network: 2hidden markov models: 2noise robust speech recognition: 1speakerbeam: 1speech separation: 1deep learning (artificial intelligence): 1input switching: 1complex backpropagation: 1transfer functions: 1signal to distortion ratio: 1convolution: 1multi channel source separation: 1acoustic beamforming: 1meeting recognition: 1speaker activity: 1recurrent neural network transducer: 1synchronisation: 1end to end: 1natural language processing: 1whole network pre training: 1entropy: 1autoregressive processes: 1audio visual systems: 1audiovisual speaker localization: 1sensor fusion: 1audio signal processing: 1data fusion: 1video signal processing: 1image fusion: 1spatial features: 1multi task loss: 1microphone arrays: 1single channel speech enhancement: 1robust asr: 1time domain analysis: 1signal denoising: 1tracking: 1backprop kalman filter: 1backpropagation: 1audiovisual speaker tracking: 1kalman filters: 1adaptation: 1auxiliary feature: 1speech separation/extraction: 1speaker attention: 1multichannel end to end asr: 1speaker adaptation: 1attention based encoder decoder: 1neural beamformer: 1group lasso regularization: 1deep neural networks: 1linear transformation network: 1deep neural network: 1singular value decomposition: 1speaker adaptive training: 1
Most Publications2021: 212022: 172020: 122023: 62017: 6

Affiliations
URLs

ICASSP2022 Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Naoyuki Kamo, Takafumi Moriya, 
Learning to Enhance or Not: Neural Network-Based Switching of Enhanced and Observed Signals for Overlapping Speech Recognition.

Interspeech2022 Marc Delcroix, Keisuke Kinoshita, Tsubasa Ochiai, Katerina Zmolíková, Hiroshi Sato, Tomohiro Nakatani, 
Listen only to me! How well can target speech extraction handle false alarms?

Interspeech2022 Kazuma Iwamoto, Tsubasa Ochiai, Marc Delcroix, Rintaro Ikeshita, Hiroshi Sato, Shoko Araki, Shigeru Katagiri, 
How bad are artifacts?: Analyzing the impact of speech enhancement errors on ASR.

Interspeech2022 Martin Kocour, Katerina Zmolíková, Lucas Ondel, Jan Svec, Marc Delcroix, Tsubasa Ochiai, Lukás Burget, Jan Cernocký, 
Revisiting joint decoding based multi-talker speech recognition with DNN acoustic model.

Interspeech2022 Takafumi Moriya, Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Takahiro Shinozaki, 
Streaming Target-Speaker ASR with Neural Transducer.

Interspeech2022 Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Takafumi Moriya, Naoki Makishima, Mana Ihori, Tomohiro Tanaka, Ryo Masumura, 
Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations.

ICASSP2021 Christoph Böddeker, Wangyou Zhang, Tomohiro Nakatani, Keisuke Kinoshita, Tsubasa Ochiai, Marc Delcroix, Naoyuki Kamo, Yanmin Qian, Reinhold Haeb-Umbach, 
Convolutive Transfer Function Invariant SDR Training Criteria for Multi-Channel Reverberant Speech Separation.

ICASSP2021 Marc Delcroix, Katerina Zmolíková, Tsubasa Ochiai, Keisuke Kinoshita, Tomohiro Nakatani, 
Speaker Activity Driven Neural Speech Extraction.

ICASSP2021 Takafumi Moriya, Takanori Ashihara, Tomohiro Tanaka, Tsubasa Ochiai, Hiroshi Sato, Atsushi Ando, Yusuke Ijima, Ryo Masumura, Yusuke Shinohara, 
Simpleflat: A Simple Whole-Network Pre-Training Approach for RNN Transducer-Based End-to-End Speech Recognition.

ICASSP2021 Julio Wissing, Benedikt T. Boenninghoff, Dorothea Kolossa, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Shoko Araki, Christopher Schymura, 
Data Fusion for Audiovisual Speaker Localization: Extending Dynamic Stream Weights to the Spatial Domain.

Interspeech2021 Marc Delcroix, Jorge Bennasar Vázquez, Tsubasa Ochiai, Keisuke Kinoshita, Shoko Araki, 
Few-Shot Learning of New Sound Classes for Target Sound Extraction.

Interspeech2021 Takafumi Moriya, Tomohiro Tanaka, Takanori Ashihara, Tsubasa Ochiai, Hiroshi Sato, Atsushi Ando, Ryo Masumura, Marc Delcroix, Taichi Asami, 
Streaming End-to-End Speech Recognition for Hybrid RNN-T/Attention Architecture.

Interspeech2021 Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Takafumi Moriya, Naoyuki Kamo, 
Should We Always Separate?: Switching Between Enhanced and Observed Signals for Overlapping Speech Recognition.

Interspeech2021 Christopher Schymura, Benedikt T. Bönninghoff, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Shoko Araki, Dorothea Kolossa, 
PILOT: Introducing Transformers for Probabilistic Sound Event Localization.

ICASSP2020 Marc Delcroix, Tsubasa Ochiai, Katerina Zmolíková, Keisuke Kinoshita, Naohiro Tawara, Tomohiro Nakatani, Shoko Araki, 
Improving Speaker Discrimination of Target Speech Extraction With Time-Domain Speakerbeam.

ICASSP2020 Keisuke Kinoshita, Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani, 
Improving Noise Robust Automatic Speech Recognition with Single-Channel Time-Domain Enhancement Network.

ICASSP2020 Christopher Schymura, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Shoko Araki, Dorothea Kolossa, 
A Dynamic Stream Weight Backprop Kalman Filter for Audiovisual Speaker Tracking.

Interspeech2020 Takafumi Moriya, Tsubasa Ochiai, Shigeki Karita, Hiroshi Sato, Tomohiro Tanaka, Takanori Ashihara, Ryo Masumura, Yusuke Shinohara, Marc Delcroix, 
Self-Distillation for Improving CTC-Transformer-Based ASR Systems.

Interspeech2020 Tsubasa Ochiai, Marc Delcroix, Yuma Koizumi, Hiroaki Ito, Keisuke Kinoshita, Shoko Araki, 
Listen to What You Want: Neural Network-Based Universal Sound Selector.

ICASSP2019 Marc Delcroix, Katerina Zmolíková, Tsubasa Ochiai, Keisuke Kinoshita, Shoko Araki, Tomohiro Nakatani, 
Compact Network for Speakerbeam Target Speaker Extraction.

#146  | Julia Hirschberg | Google Scholar   DBLP
VenuesInterspeech: 25SpeechComm: 2
Years2021: 12020: 32019: 42018: 52017: 32016: 11
ISCA Sectionspecial session: 8social signals detection and speaker traits analysis: 2speech technologies for code-switching in multilingual communities: 2deception, personality, and culture attribute: 2speech synthesis: 2health and affect: 1speech in multimodality: 1spoken term detection: 1text analysis, multilingual issues and evaluation in speech synthesis: 1representation learning for emotion: 1stance, credibility, and deception: 1behavioral signal processing and speaker state and traits analytics: 1special event: 1spoken dialogue systems: 1
IEEE Keyword
Most Publications2016: 162018: 132014: 122001: 122009: 11


Interspeech2021 Huyen Nguyen, Ralph Vente, David Lupea, Sarah Ita Levitan, Julia Hirschberg
Acoustic-Prosodic, Lexical and Demographic Cues to Persuasiveness in Competitive Debate Speeches.

SpeechComm2020 Andreas Weise, Sarah Ita Levitan, Julia Hirschberg, Rivka Levitan, 
Individual differences in acoustic-prosodic entrainment in spoken dialogue.

SpeechComm2020 Ramiro H. Gálvez, Agustín Gravano, Stefan Benus, Rivka Levitan, Marián Trnka, Julia Hirschberg
An empirical study of the effect of acoustic-prosodic entrainment on the perceived trustworthiness of conversational avatars.

Interspeech2020 Jiaxuan Zhang, Sarah Ita Levitan, Julia Hirschberg
Multimodal Deception Detection Using Automatically Extracted Acoustic, Visual, and Lexical Features.

Interspeech2019 Alice Baird, Eduardo Coutinho, Julia Hirschberg, Björn W. Schuller, 
Sincerity in Acted Speech: Presenting the Sincere Apology Corpus and Results.

Interspeech2019 Victor Soto, Julia Hirschberg
Improving Code-Switched Language Modeling Performance Using Cognate Features.

Interspeech2019 Zixiaofan Yang, Julia Hirschberg
Linguistically-Informed Training of Acoustic Word Embeddings for Low-Resource Languages.

Interspeech2019 Zixiaofan Yang, Bingyan Hu, Julia Hirschberg
Predicting Humor by Learning from Time-Aligned Comments.

Interspeech2018 Guozhen An, Sarah Ita Levitan, Julia Hirschberg, Rivka Levitan, 
Deep Personality Recognition for Deception Detection.

Interspeech2018 Kai-Zhan Lee, Erica Cooper, Julia Hirschberg
A Comparison of Speaker-based and Utterance-based Data Selection for Text-to-Speech Synthesis.

Interspeech2018 Sarah Ita Levitan, Angel Maredia, Julia Hirschberg
Acoustic-Prosodic Indicators of Deception and Trust in Interview Dialogues.

Interspeech2018 Victor Soto, Nishmar Cestero, Julia Hirschberg
The Role of Cognate Words, POS Tags and Entrainment in Code-Switching.

Interspeech2018 Zixiaofan Yang, Julia Hirschberg
Predicting Arousal and Valence from Waveforms and Spectrograms Using Deep Neural Networks.

Interspeech2017 Erica Cooper, Xinyue Wang, Alison Chang, Yocheved Levitan, Julia Hirschberg
Utterance Selection for Optimizing Intelligibility of TTS Voices Trained on ASR Data.

Interspeech2017 Gideon Mendels, Sarah Ita Levitan, Kai-Zhan Lee, Julia Hirschberg
Hybrid Acoustic-Lexical Deep Learning Approach for Deception Detection.

Interspeech2017 Victor Soto, Julia Hirschberg
Crowdsourcing Universal Part-of-Speech Tags for Code-Switching.

Interspeech2016 Guozhen An, Sarah Ita Levitan, Rivka Levitan, Andrew Rosenberg, Michelle Levine, Julia Hirschberg
Automatically Classifying Self-Rated Personality Scores from Speech.

Interspeech2016 Erica Cooper, Alison Chang, Yocheved Levitan, Julia Hirschberg
Data Selection and Adaptation for Naturalness in HMM-Based Speech Synthesis.

Interspeech2016 Mona T. Diab, Pascale Fung, Julia Hirschberg, Thamar Solorio, 
Computational Approaches to Linguistic Code Switching.

Interspeech2016 Sarah Ita Levitan, Guozhen An, Min Ma, Rivka Levitan, Andrew Rosenberg, Julia Hirschberg
Combining Acoustic-Prosodic, Lexical, and Phonotactic Features for Automatic Deception Detection.

#147  | Hugo Van hamme | Google Scholar   DBLP
VenuesInterspeech: 15ICASSP: 7TASLP: 3SpeechComm: 2
Years2022: 52021: 42020: 22019: 22018: 42017: 32016: 7
ISCA Sectionspeech and audio source separation and scene analysis: 2deep neural networks: 2low resource spoken language understanding: 1speech articulation & neural processing: 1non-intrusive objective speech quality assessment (nisqa) challenge for online conferencing applications: 1spoken language understanding: 1speech and audio analysis: 1speech perception: 1assessment of pathological speech and language: 1disordered speech: 1spoken dialogue systems and conversational analysis: 1source separation and voice activity detection: 1speaker diarization and recognition: 1
IEEE Keywordspeech recognition: 3speech enhancement: 3electroencephalography: 2medical signal processing: 2eeg: 2speech decoding: 2source separation: 2matrix decomposition: 2signal representation: 2pattern classification: 1signal classification: 1domain generalization: 1unsupervised learning: 1factorized hierarchical variational autoencoder: 1recurrent neural nets: 1lstm: 1brain: 1convolutional neural nets: 1auditory system: 1brain computer interfaces: 1cnn: 1neurophysiology: 1capsule networks: 1speaker identification: 1multitask learning: 1end to end: 1belief networks: 1spoken language understanding: 1cross domain learning: 1joint learning: 1multi speaker source separation: 1speaker recognition: 1room impulse response estimation: 1microphones: 1sparse representations: 1dereverberation: 1matrix algebra: 1non negative matrix: 1fourier transforms: 1exemplar based: 1reverberation: 1user interfaces: 1natural language processing: 1hidden markov models: 1nonnegative matrix factorisation: 1weak supervision: 1semantic networks: 1language acquisition: 1deep auto encoder: 1unseen noise compensation: 1convex programming: 1non negative matrix factorisation: 1speech dereverberation: 1deconvolution: 1cepstral analysis: 1approximation theory: 1non negative matrix de convolution: 1signal denoising: 1transient response: 1estimation theory: 1probability: 1language model adaptation: 1named entities: 1phrase based machine translation: 1spoken translations: 1noise robust exemplar matching: 1collinearity reduction: 1k medoids: 1alpha beta divergence: 1exemplar selection: 1
Most Publications2022: 252012: 212014: 202015: 182013: 18


ICASSP2022 Lies Bollens, Tom Francart, Hugo Van hamme
Learning Subject-Invariant Representations from Speech-Evoked EEG Using Variational Autoencoders.

Interspeech2022 Quentin Meeus, Marie-Francine Moens, Hugo Van hamme
Multitask Learning for Low Resource Spoken Language Understanding.

Interspeech2022 Corentin Puffay, Jana Van Canneyt, Jonas Vanthornhout, Hugo Van hamme, Tom Francart, 
Relating the fundamental frequency of speech with EEG using a dilated convolutional network.

Interspeech2022 Bastiaan Tamm, Helena Balabin, Rik Vandenberghe, Hugo Van hamme
Pre-trained Speech Representations as Feature Extractors for Speech Quality Assessment in Online Conferencing Applications.

Interspeech2022 Pu Wang, Hugo Van hamme
Bottleneck Low-rank Transformers for Low-resource Spoken Language Understanding.

Interspeech2021 Wim Boes, Hugo Van hamme
Audiovisual Transfer Learning for Audio Tagging and Sound Event Detection.

Interspeech2021 Mohammad Jalilpour-Monesi, Bernd Accou, Tom Francart, Hugo Van hamme
Extracting Different Levels of Speech Information from EEG Using an LSTM-Based Model.

Interspeech2021 Jinzi Qi, Hugo Van hamme
Speech Disorder Classification Using Extended Factorized Hierarchical Variational Auto-Encoders.

Interspeech2021 Pu Wang, Bagher BabaAli, Hugo Van hamme
A Study into Pre-Training Strategies for Spoken Language Understanding on Dysarthric Speech.

ICASSP2020 Mohammad Jalilpour-Monesi, Bernd Accou, Jair Montoya-Martínez, Tom Francart, Hugo Van hamme
An LSTM Based Architecture to Relate Speech Stimulus to Eeg.

ICASSP2020 Jakob Poncelet, Hugo Van hamme
Multitask Learning with Capsule Networks for Speech-to-Intent Applications.

Interspeech2019 Pieter Appeltans, Jeroen Zegers, Hugo Van hamme
Practical Applicability of Deep Neural Networks for Overlapping Speaker Separation.

Interspeech2019 Jeroen Zegers, Hugo Van hamme
CNN-LSTM Models for Multi-Speaker Source Separation Using Bayesian Hyper Parameter Optimization.

ICASSP2018 Jeroen Zegers, Hugo Van hamme
Multi-Scenario Deep Learning for Multi-Speaker Source Separation.

Interspeech2018 Vincent Renkens, Hugo Van hamme
Capsule Networks for Low Resource Spoken Language Understanding.

Interspeech2018 Lyan Verwimp, Hugo Van hamme, Vincent Renkens, Patrick Wambacq, 
State Gradients for RNN Memory Analysis.

Interspeech2018 Jeroen Zegers, Hugo Van hamme
Memory Time Span in LSTMs for Multi-Speaker Source Separation.

TASLP2017 Deepak Baby, Hugo Van hamme
Joint Denoising and Dereverberation Using Exemplar-Based Sparse Representations and Decaying Norm Constraint.

TASLP2017 Vincent Renkens, Hugo Van hamme
Weakly Supervised Learning of Hidden Markov Models for Spoken Language Acquisition.

Interspeech2017 Jeroen Zegers, Hugo Van hamme
Improving Source Separation via Multi-Speaker Representations.

#148  | Md. Sahidullah | Google Scholar   DBLP
VenuesInterspeech: 15ICASSP: 6SpeechComm: 4TASLP: 2
Years2023: 12022: 22021: 42020: 32019: 22018: 32017: 82016: 4
ISCA Sectionspecial session: 2robust speaker recognition and anti-spoofing: 2speech coding and privacy: 1language and accent recognition: 1oriental language recognition: 1voice anti-spoofing and countermeasure: 1speaker embedding: 1the 2019 automatic speaker verification spoofing and countermeasures challenge: 1speaker verification: 1speaker recognition evaluation: 1speaker database and anti-spoofing: 1short utterances speaker recognition: 1speech and audio segmentation and classification: 1
IEEE Keywordspeaker recognition: 7speaker verification: 5security of data: 3voice conversion: 2privacy: 2data privacy: 2speech recognition: 2spoofing: 2linkability: 1natural language processing: 1speaker anonymization: 1nonlinear compression: 1multi regime compression: 1data compression: 1deep learning (artificial intelligence): 1spoofing counter measures: 1automatic speaker verification (asv): 1presentation attack detection: 1detect ion cost function: 1linkage attack: 1mimicry: 1gender classification: 1gender dependent system: 1replay: 1replay attack: 1gaussian processes: 1signal classification: 1asvspoof 2015: 1maximum likelihood estimation: 1cepstral analysis: 1biometrics (access control): 1spoofing attack: 1mixture models: 1generalized countermeasure: 1btas 2016: 1
Most Publications2021: 342022: 232023: 132019: 132018: 13

Affiliations
URLs

SpeechComm2023 Premjeet Singh, Md. Sahidullah, Goutam Saha 0001, 
Modulation spectral features for speech emotion recognition using deep neural networks.

TASLP2022 Brij Mohan Lal Srivastava, Mohamed Maouche, Md. Sahidullah, Emmanuel Vincent 0001, Aurélien Bellet, Marc Tommasi, Natalia A. Tomashenko, Xin Wang 0037, Junichi Yamagishi, 
Privacy and Utility of X-Vector Based Speaker Anonymization.

ICASSP2022 Xuechen Liu, Md. Sahidullah, Tomi Kinnunen, 
Learnable Nonlinear Compression for Robust Speaker Verification.

Interspeech2021 Bhusan Chettri, Rosa González Hautamäki, Md. Sahidullah, Tomi Kinnunen, 
Data Quality as Predictor of Voice Anti-Spoofing Generalization.

Interspeech2021 Raphaël Duroselle, Md. Sahidullah, Denis Jouvet, Irina Illina, 
Modeling and Training Strategies for Language Recognition Systems.

Interspeech2021 Raphaël Duroselle, Md. Sahidullah, Denis Jouvet, Irina Illina, 
Language Recognition on Unknown Conditions: The LORIA-Inria-MULTISPEECH System for AP20-OLR Challenge.

Interspeech2021 Tomi Kinnunen, Andreas Nautsch, Md. Sahidullah, Nicholas W. D. Evans, Xin Wang 0037, Massimiliano Todisco, Héctor Delgado, Junichi Yamagishi, Kong Aik Lee, 
Visualizing Classifier Adjacency Relations: A Case Study in Speaker Verification and Voice Anti-Spoofing.

TASLP2020 Tomi Kinnunen, Héctor Delgado, Nicholas W. D. Evans, Kong Aik Lee, Ville Vestman, Andreas Nautsch, Massimiliano Todisco, Xin Wang 0037, Md. Sahidullah, Junichi Yamagishi, Douglas A. Reynolds, 
Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification: Fundamentals.

ICASSP2020 Brij Mohan Lal Srivastava, Nathalie Vauquier, Md. Sahidullah, Aurélien Bellet, Marc Tommasi, Emmanuel Vincent 0001, 
Evaluating Voice Conversion-Based Privacy Protection against Informed Attackers.

Interspeech2020 Xuechen Liu, Md. Sahidullah, Tomi Kinnunen, 
A Comparative Re-Assessment of Feature Extractors for Deep Speaker Embeddings.

ICASSP2019 Tomi Kinnunen, Rosa González Hautamäki, Ville Vestman, Md. Sahidullah
Can We Use Speaker Recognition Technology to Attack Itself? Enhancing Mimicry Attacks Using Automatic Target Speaker Selection.

Interspeech2019 Massimiliano Todisco, Xin Wang 0037, Ville Vestman, Md. Sahidullah, Héctor Delgado, Andreas Nautsch, Junichi Yamagishi, Nicholas W. D. Evans, Tomi H. Kinnunen, Kong Aik Lee, 
ASVspoof 2019: Future Horizons in Spoofed and Fake Audio Detection.

SpeechComm2018 Rosa González Hautamäki, Md. Sahidullah, Ville Hautamäki, Tomi Kinnunen, 
Acoustical and perceptual study of voice disguise by age modification in speaker verification.

SpeechComm2018 Ville Vestman, Dhananjaya N. Gowda, Md. Sahidullah, Paavo Alku, Tomi Kinnunen, 
Speaker recognition from whispered speech: A tutorial survey and an application of time-varying linear prediction.

Interspeech2018 Massimiliano Todisco, Héctor Delgado, Kong-Aik Lee, Md. Sahidullah, Nicholas W. D. Evans, Tomi Kinnunen, Junichi Yamagishi, 
Integrated Presentation Attack Detection and Automatic Speaker Verification: Common Features and Gaussian Back-end Fusion.

SpeechComm2017 Cemal Hanilçi, Tomi Kinnunen, Md. Sahidullah, Aleksandr Sizov, 
Spoofing detection goes noisy: An analysis of synthetic speech detection in the presence of additive noise.

ICASSP2017 Anssi Kanervisto, Ville Vestman, Md. Sahidullah, Ville Hautamäki, Tomi Kinnunen, 
Effects of gender information in text-independent and text-dependent speaker verification.

ICASSP2017 Tomi Kinnunen, Md. Sahidullah, Mauro Falcone, Luca Costantini, Rosa González Hautamäki, Dennis Alexander Lehmann Thomsen, Achintya Kumar Sarkar, Zheng-Hua Tan, Héctor Delgado, Massimiliano Todisco, Nicholas W. D. Evans, Ville Hautamäki, Kong-Aik Lee, 
RedDots replayed: A new replay spoofing attack corpus for text-dependent speaker verification research.

ICASSP2017 Dipjyoti Paul, Md. Sahidullah, Goutam Saha 0001, 
Generalization of spoofing countermeasures: A case study with ASVspoof 2015 and BTAS 2016 corpora.

Interspeech2017 Tomi Kinnunen, Md. Sahidullah, Héctor Delgado, Massimiliano Todisco, Nicholas W. D. Evans, Junichi Yamagishi, Kong-Aik Lee, 
The ASVspoof 2017 Challenge: Assessing the Limits of Replay Spoofing Attack Detection.

#149  | Naoyuki Kanda | Google Scholar   DBLP
VenuesInterspeech: 20ICASSP: 6SpeechComm: 1
Years2022: 62021: 102020: 22019: 52018: 12016: 3
ISCA Sectionrobust asr, and far-field/multi-talker asr: 2source separation: 2multi- and cross-lingual asr, other topics in asr: 2other topics in speech recognition: 1novel models and training methods for asr: 1streaming for asr/rnn transducers: 1applications in transcription, education and learning: 1neural network training methods for asr: 1asr neural network architectures: 1training strategies for asr: 1speaker recognition: 1turn management in dialogue: 1far-field speech recognition: 1asr neural network training: 1neural network training strategies for asr: 1language modeling for conversational speech and confidence measures: 1decoding, system combination: 1
IEEE Keywordspeech recognition: 5speaker recognition: 3natural language processing: 3speaker counting: 2speaker diarization: 2audio signal processing: 2speech separation: 2voice activity detection: 1rich transcription: 1probability: 1bayes methods: 1minimum bayes risk training: 1speaker identification: 1recurrent neural network transducer: 1attention based encoder decoder: 1language model: 1end to end approach: 1spoken language understanding: 1pre training: 1transfer learning: 1self supervised learning: 1filtering theory: 1source separation: 1system fusion: 1acoustic model: 1speech enhancement: 1
Most Publications2021: 332022: 202020: 122019: 122023: 3

Affiliations
URLs

ICASSP2022 Naoyuki Kanda, Xiong Xiao, Yashesh Gaur, Xiaofei Wang 0009, Zhong Meng, Zhuo Chen 0006, Takuya Yoshioka, 
Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers Using End-to-End Speaker-Attributed ASR.

Interspeech2022 Naoyuki Kanda, Jian Wu 0027, Yu Wu 0012, Xiong Xiao, Zhong Meng, Xiaofei Wang 0009, Yashesh Gaur, Zhuo Chen 0006, Jinyu Li 0001, Takuya Yoshioka, 
Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings.

Interspeech2022 Naoyuki Kanda, Jian Wu 0027, Yu Wu 0012, Xiong Xiao, Zhong Meng, Xiaofei Wang 0009, Yashesh Gaur, Zhuo Chen 0006, Jinyu Li 0001, Takuya Yoshioka, 
Streaming Multi-Talker ASR with Token-Level Serialized Output Training.

Interspeech2022 Zhong Meng, Yashesh Gaur, Naoyuki Kanda, Jinyu Li 0001, Xie Chen 0001, Yu Wu 0012, Yifan Gong 0001, 
Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition.

Interspeech2022 Xiaofei Wang 0009, Dongmei Wang, Naoyuki Kanda, Sefik Emre Eskimez, Takuya Yoshioka, 
Leveraging Real Conversational Data for Multi-Channel Continuous Speech Separation.

Interspeech2022 Wangyou Zhang, Zhuo Chen 0006, Naoyuki Kanda, Shujie Liu 0001, Jinyu Li 0001, Sefik Emre Eskimez, Takuya Yoshioka, Xiong Xiao, Zhong Meng, Yanmin Qian, Furu Wei, 
Separating Long-Form Speech with Group-wise Permutation Invariant Training.

ICASSP2021 Naoyuki Kanda, Zhong Meng, Liang Lu 0001, Yashesh Gaur, Xiaofei Wang 0009, Zhuo Chen 0006, Takuya Yoshioka, 
Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR.

ICASSP2021 Zhong Meng, Naoyuki Kanda, Yashesh Gaur, Sarangarajan Parthasarathy, Eric Sun, Liang Lu 0001, Xie Chen 0001, Jinyu Li 0001, Yifan Gong 0001, 
Internal Language Model Training for Domain-Adaptive End-To-End Speech Recognition.

ICASSP2021 Yao Qian, Ximo Bian, Yu Shi 0001, Naoyuki Kanda, Leo Shen, Zhen Xiao, Michael Zeng 0001, 
Speech-Language Pre-Training for End-to-End Spoken Language Understanding.

ICASSP2021 Xiong Xiao, Naoyuki Kanda, Zhuo Chen 0006, Tianyan Zhou, Takuya Yoshioka, Sanyuan Chen, Yong Zhao 0008, Gang Liu, Yu Wu 0012, Jian Wu 0027, Shujie Liu 0001, Jinyu Li 0001, Yifan Gong 0001, 
Microsoft Speaker Diarization System for the Voxceleb Speaker Recognition Challenge 2020.

Interspeech2021 Liang Lu 0001, Naoyuki Kanda, Jinyu Li 0001, Yifan Gong 0001, 
Streaming Multi-Talker Speech Recognition with Joint Speaker Identification.

Interspeech2021 Liang Lu 0001, Zhong Meng, Naoyuki Kanda, Jinyu Li 0001, Yifan Gong 0001, 
On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer.

Interspeech2021 Naoyuki Kanda, Guoli Ye, Yashesh Gaur, Xiaofei Wang 0009, Zhong Meng, Zhuo Chen 0006, Takuya Yoshioka, 
End-to-End Speaker-Attributed ASR with Transformer.

Interspeech2021 Naoyuki Kanda, Guoli Ye, Yu Wu 0012, Yashesh Gaur, Xiaofei Wang 0009, Zhong Meng, Zhuo Chen 0006, Takuya Yoshioka, 
Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone.

Interspeech2021 Zhong Meng, Yu Wu 0012, Naoyuki Kanda, Liang Lu 0001, Xie Chen 0001, Guoli Ye, Eric Sun, Jinyu Li 0001, Yifan Gong 0001, 
Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition.

Interspeech2021 Jian Wu 0027, Zhuo Chen 0006, Sanyuan Chen, Yu Wu 0012, Takuya Yoshioka, Naoyuki Kanda, Shujie Liu 0001, Jinyu Li 0001, 
Investigation of Practical Aspects of Single Channel Speech Separation for ASR.

Interspeech2020 Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang 0009, Zhong Meng, Zhuo Chen 0006, Tianyan Zhou, Takuya Yoshioka, 
Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of any Number of Speakers.

Interspeech2020 Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang 0009, Zhong Meng, Takuya Yoshioka, 
Serialized Output Training for End-to-End Overlapped Speech Recognition.

ICASSP2019 Naoyuki Kanda, Yusuke Fujita, Shota Horiguchi, Rintaro Ikeshita, Kenji Nagamatsu, Shinji Watanabe 0001, 
Acoustic Modeling for Distant Multi-talker Speech Recognition with Single- and Multi-channel Branches.

Interspeech2019 Yusuke Fujita, Naoyuki Kanda, Shota Horiguchi, Kenji Nagamatsu, Shinji Watanabe 0001, 
End-to-End Neural Speaker Diarization with Permutation-Free Objectives.

#150  | Rohan Kumar Das | Google Scholar   DBLP
VenuesInterspeech: 17ICASSP: 7TASLP: 2SpeechComm: 1
Years2022: 22021: 32020: 82019: 102017: 32016: 1
ISCA Sectionspeaker recognition: 2speech and speaker recognition: 2speech signal characterization: 2special session: 2the first dicova challenge: 1the attacker’s perpective on automatic speaker verification: 1the interspeech 2020 far field speaker verification challenge: 1speaker recognition challenges and applications: 1anti-spoofing and liveness detection: 1the interspeech 2019 computational paralinguistics challenge (compare): 1the 2019 automatic speaker verification spoofing and countermeasures challenge: 1speaker recognition evaluation: 1speaker and language recognition applications: 1
IEEE Keywordspeaker recognition: 7cepstral analysis: 3convolutional neural nets: 2signal detection: 2speech recognition: 2anti spoofing: 2synthetic speech detection: 2speech synthesis: 2security of data: 2natural language processing: 2multi scale frequency channel attention: 1text independent speaker verification: 1short utterance: 1pattern classification: 1self supervised speaker recognition: 1pseudo label selection: 1loss gated learning: 1unsupervised learning: 1gaussian processes: 1signal classification: 1unknown kind spoofing detection: 1transforms: 1mixture models: 1constant q modified octave coefficients: 1modified magnitude phase spectrum: 1data augmentation: 1signal companding: 1chains corpus: 1speaker characterization: 1whispered speech: 1vocal tract constriction: 1generalized countermeasures: 1synthetic attacks: 1asvspoof 2019: 1replay attacks: 1end to end: 1word processing: 1text to speech: 1text analysis: 1crosslingual word embedding: 1code switching: 1constant q multi level coefficients (cmc): 1multi level transform (mlt): 1voice activity detection: 1replay speech detection: 1phonetic posteriorgram (ppg): 1voice conversion: 1average modeling approach (ama): 1cross lingual: 1
Most Publications2020: 252019: 212021: 112022: 102018: 9

Affiliations
URLs

ICASSP2022 Tianchi Liu 0004, Rohan Kumar Das, Kong Aik Lee, Haizhou Li 0001, 
MFA: TDNN with Multi-Scale Frequency-Channel Attention for Text-Independent Speaker Verification with Short Utterances.

ICASSP2022 Ruijie Tao, Kong Aik Lee, Rohan Kumar Das, Ville Hautamäki, Haizhou Li 0001, 
Self-Supervised Speaker Recognition with Loss-Gated Learning.

TASLP2021 Jichen Yang, Hongji Wang, Rohan Kumar Das, Yanmin Qian, 
Modified Magnitude-Phase Spectrum Information for Spoofing Detection.

ICASSP2021 Rohan Kumar Das, Jichen Yang, Haizhou Li 0001, 
Data Augmentation with Signal Companding for Detection of Logical Access Attacks.

Interspeech2021 Rohan Kumar Das, Maulik C. Madhavi, Haizhou Li 0001, 
Diagnosis of COVID-19 Using Auditory Acoustic Cues.

ICASSP2020 Rohan Kumar Das, Haizhou Li 0001, 
On the Importance of Vocal Tract Constriction for Speaker Characterization: The Whispered Speech Study.

ICASSP2020 Rohan Kumar Das, Jichen Yang, Haizhou Li 0001, 
Assessing the Scope of Generalized Countermeasures for Anti-Spoofing.

ICASSP2020 Xuehao Zhou, Xiaohai Tian, Grandee Lee, Rohan Kumar Das, Haizhou Li 0001, 
End-to-End Code-Switching TTS with Cross-Lingual Language Model.

Interspeech2020 Tianchi Liu 0004, Rohan Kumar Das, Maulik C. Madhavi, Shengmei Shen, Haizhou Li 0001, 
Speaker-Utterance Dual Attention for Speaker and Utterance Verification.

Interspeech2020 Rohan Kumar Das, Xiaohai Tian, Tomi Kinnunen, Haizhou Li 0001, 
The Attacker's Perspective on Automatic Speaker Verification: An Overview.

Interspeech2020 Xiaoyi Qin, Ming Li 0026, Hui Bu, Wei Rao, Rohan Kumar Das, Shrikanth Narayanan, Haizhou Li 0001, 
The INTERSPEECH 2020 Far-Field Speaker Verification Challenge.

Interspeech2020 Ruijie Tao, Rohan Kumar Das, Haizhou Li 0001, 
Audio-Visual Speaker Recognition with a Cross-Modal Discriminative Network.

Interspeech2020 Zhenzong Wu, Rohan Kumar Das, Jichen Yang, Haizhou Li 0001, 
Light Convolutional Neural Network with Feature Genuinization for Detection of Synthetic Speech Attacks.

TASLP2019 Jichen Yang, Rohan Kumar Das, Nina Zhou, 
Extraction of Octave Spectra Information for Spoofing Attack Detection.

ICASSP2019 Yi Zhou 0020, Xiaohai Tian, Haihua Xu, Rohan Kumar Das, Haizhou Li 0001, 
Cross-lingual Voice Conversion with Bilingual Phonetic Posteriorgram and Average Modeling.

Interspeech2019 Rohan Kumar Das, Haizhou Li 0001, 
Instantaneous Phase and Long-Term Acoustic Cues for Orca Activity Detection.

Interspeech2019 Rohan Kumar Das, Jichen Yang, Haizhou Li 0001, 
Long Range Acoustic Features for Spoofed Speech Detection.

Interspeech2019 Sarfaraz Jelil, Abhishek Shrivastava, Rohan Kumar Das, S. R. Mahadeva Prasanna, Rohit Sinha 0003, 
SpeechMarker: A Voice Based Multi-Level Attendance Application.

Interspeech2019 Kong Aik Lee, Ville Hautamäki, Tomi H. Kinnunen, Hitoshi Yamamoto, Koji Okabe, Ville Vestman, Jing Huang 0019, Guohong Ding, Hanwu Sun, Anthony Larcher, Rohan Kumar Das, Haizhou Li 0001, Mickael Rouvier, Pierre-Michel Bousquet, Wei Rao, Qing Wang 0039, Chunlei Zhang, Fahimeh Bahmaninezhad, Héctor Delgado, Massimiliano Todisco, 
I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences.

Interspeech2019 Tianchi Liu 0004, Maulik C. Madhavi, Rohan Kumar Das, Haizhou Li 0001, 
A Unified Framework for Speaker and Utterance Verification.

#151  | Yan Song 0001 | Google Scholar   DBLP
VenuesInterspeech: 15ICASSP: 9TASLP: 3
Years2022: 42021: 42020: 52019: 52018: 72017: 12016: 1
ISCA Sectionspeaker and language recognition: 2language and accent recognition: 1acoustic event detection and acoustic scene classification: 1learning techniques for speaker recognition: 1asr neural network architectures and training: 1acoustic event detection: 1speaker recognition and diarization: 1speaker verification using neural network methods: 1representation learning for emotion: 1acoustic scenes and rare events: 1novel neural network architectures for acoustic modelling: 1speaker verification: 1language recognition: 1speech and audio segmentation and classification: 1
IEEE Keywordspeech recognition: 7speaker recognition: 5convolutional neural nets: 3speech separation: 3supervised learning: 2speaker verification: 2deep learning (artificial intelligence): 2source separation: 2recurrent neural nets: 2sound event detection: 2audio signal processing: 2audio tagging: 2label permutation problem: 2natural language processing: 2representation learning: 1anomalous sound detection: 1knowledge based systems: 1self supervised learning: 1end to end: 1unsupervised domain adaptation: 1label smoothing: 1knowledge distillation: 1emotion recognition: 1convolutional neural network: 1signal reconstruction: 1style transformation: 1speech emotion recognition: 1disentanglement: 1probability: 1sequence alignment: 1encoder decoder: 1post inference: 1inference mechanisms: 1end to end asr: 1multi granularity: 1embedding learning: 1dense residual networks: 1model ensemble: 1signal representation: 1speaker identification: 1time domain: 1target tracking: 1time domain analysis: 1sparse encoder: 1semi supervised learning: 1weakly labeled: 1autoregressive processes: 1computational auditory scene analysis: 1agglomerative hierarchical clustering: 1topic detection: 1document handling: 1hidden markov models: 1pattern clustering: 1consensus analysis: 1signal classification: 1weakly labelled data: 1attention: 1statistics: 1language identification deep neural network i vector lid senones: 1
Most Publications2018: 102007: 102019: 92006: 92021: 8

Affiliations
University of Science and Technology of China, National Engineering Laboratory for Speech and Language Information Processing, Hefei, China
URLs

ICASSP2022 Han Chen, Yan Song 0001, Li-Rong Dai 0001, Ian McLoughlin 0001, Lin Liu, 
Self-Supervised Representation Learning for Unsupervised Anomalous Sound Detection Under Domain Shift.

ICASSP2022 Hang-Rui Hu, Yan Song 0001, Ying Liu, Li-Rong Dai 0001, Ian McLoughlin 0001, Lin Liu, 
Domain Robust Deep Embedding Learning for Speaker Recognition.

ICASSP2022 Yuxuan Xi, Yan Song 0001, Li-Rong Dai 0001, Ian McLoughlin 0001, Lin Liu, 
Frontend Attributes Disentanglement for Speech Emotion Recognition.

Interspeech2022 Hang-Rui Hu, Yan Song 0001, Li-Rong Dai 0001, Ian McLoughlin 0001, Lin Liu, 
Class-Aware Distribution Alignment based Unsupervised Domain Adaptation for Speaker Verification.

TASLP2021 Jian Tang, Jie Zhang 0042, Yan Song 0001, Ian McLoughlin 0001, Li-Rong Dai 0001, 
Multi-Granularity Sequence Alignment Mapping for Encoder-Decoder Based End-to-End ASR.

ICASSP2021 Ying Liu, Yan Song 0001, Ian McLoughlin 0001, Lin Liu, Li-Rong Dai 0001, 
An Effective Deep Embedding Learning Method Based on Dense-Residual Networks for Speaker Verification.

Interspeech2021 Hui Wang, Lin Liu, Yan Song 0001, Lei Fang, Ian McLoughlin 0001, Li-Rong Dai 0001, 
A Weight Moving Average Based Alternate Decoupled Learning Algorithm for Long-Tailed Language Identification.

Interspeech2021 Xu Zheng, Yan Song 0001, Li-Rong Dai 0001, Ian McLoughlin 0001, Lin Liu, 
An Effective Mutual Mean Teaching Based Domain Adaptation Method for Sound Event Detection.

ICASSP2020 Hui Wang, Yan Song 0001, Zengxi Li, Ian McLoughlin 0001, Li-Rong Dai 0001, 
An Online Speaker-aware Speech Separation Approach Based on Time-domain Representation.

ICASSP2020 Jie Yan, Yan Song 0001, Li-Rong Dai 0001, Ian McLoughlin 0001, 
Task-Aware Mean Teacher Method for Large Scale Weakly Labeled Semi-Supervised Sound Event Detection.

Interspeech2020 Ying Liu, Yan Song 0001, Yiheng Jiang, Ian McLoughlin 0001, Lin Liu, Li-Rong Dai 0001, 
An Effective Speaker Recognition Method Based on Joint Identification and Verification Supervisions.

Interspeech2020 Zi-qiang Zhang, Yan Song 0001, Jian-Shu Zhang, Ian McLoughlin 0001, Li-Rong Dai 0001, 
Semi-Supervised End-to-End ASR via Teacher-Student Learning with Conditional Posterior Distribution.

Interspeech2020 Xu Zheng, Yan Song 0001, Jie Yan, Li-Rong Dai 0001, Ian McLoughlin 0001, Lin Liu, 
An Effective Perturbation Based Semi-Supervised Learning Method for Sound Event Detection.

TASLP2019 Zengxi Li, Yan Song 0001, Li-Rong Dai 0001, Ian McLoughlin 0001, 
Listening and Grouping: An Online Autoregressive Approach for Monaural Speech Separation.

ICASSP2019 Jian Sun, Wu Guo, Zhi Chen, Yan Song 0001
Topic Detection in Conversational Telephone Speech Using CNN with Multi-stream Inputs.

ICASSP2019 Jie Yan, Yan Song 0001, Wu Guo, Li-Rong Dai 0001, Ian McLoughlin 0001, Liang Chen, 
A Region Based Attention Method for Weakly Supervised Sound Event Detection and Classification.

Interspeech2019 Zhifu Gao, Yan Song 0001, Ian McLoughlin 0001, Pengcheng Li, Yiheng Jiang, Li-Rong Dai 0001, 
Improving Aggregation and Loss Function for Better Embedding Learning in End-to-End Speaker Verification System.

Interspeech2019 Yiheng Jiang, Yan Song 0001, Ian McLoughlin 0001, Zhifu Gao, Li-Rong Dai 0001, 
An Effective Deep Embedding Learning Architecture for Speaker Verification.

TASLP2018 Ma Jin, Yan Song 0001, Ian McLoughlin 0001, Li-Rong Dai 0001, 
LID-Senones and Their Statistics for Language Identification.

ICASSP2018 Zengxi Li, Yan Song 0001, Li-Rong Dai 0001, Ian McLoughlin 0001, 
Source-Aware Context Network for Single-Channel Multi-Speaker Speech Separation.

#152  | Yanzhang He | Google Scholar   DBLP
VenuesInterspeech: 16ICASSP: 11
Years2022: 102021: 82020: 72019: 2
ISCA Sectionasr: 2asr technologies and systems: 1speech segmentation: 1search/decoding algorithms for asr: 1multi-, cross-lingual and other topics in asr: 1novel models and training methods for asr: 1resource-constrained asr: 1search/decoding techniques and confidence measures for asr: 1spoken term detection & voice search: 1streaming for asr/rnn transducers: 1speech classification: 1streaming asr: 1evaluation of speech technology systems and methods for resource construction and annotation: 1single-channel speech enhancement: 1asr neural network architectures: 1
IEEE Keywordspeech recognition: 10recurrent neural nets: 6natural language processing: 3rnn t: 3probability: 3speech coding: 3confidence scores: 2automatic speech recognition: 2end to end asr: 2conformer: 2latency: 2optimisation: 2domain adaptation: 1semi supervised learning (artificial intelligence): 1self supervised learning: 1semi supervised learning: 1estimation theory: 1out of domain: 1feature selection: 1end to end: 1rnnt: 1two pass asr: 1long form asr: 1speaker recognition: 1cascaded encoders: 1hidden markov models: 1calibration: 1mean square error methods: 1voice activity detection: 1attention based end to end models: 1transformer: 1confidence: 1regression analysis: 1endpointer: 1vocabulary: 1supervised learning: 1decoding: 1text analysis: 1mobile handsets: 1
Most Publications2022: 242021: 192020: 152019: 52015: 4

Affiliations
URLs

ICASSP2022 Dongseong Hwang, Ananya Misra, Zhouyuan Huo, Nikhil Siddhartha, Shefali Garg, David Qiu, Khe Chai Sim, Trevor Strohman, Françoise Beaufays, Yanzhang He
Large-Scale ASR Domain Adaptation Using Self- and Semi-Supervised Learning.

ICASSP2022 Qiujia Li, Yu Zhang 0033, David Qiu, Yanzhang He, Liangliang Cao, Philip C. Woodland, 
Improving Confidence Estimation on Out-of-Domain Data for End-to-End Speech Recognition.

ICASSP2022 Tara N. Sainath, Yanzhang He, Arun Narayanan, Rami Botros, Weiran Wang, David Qiu, Chung-Cheng Chiu, Rohit Prabhavalkar, Alexander Gruenstein, Anmol Gulati, Bo Li 0028, David Rybach, Emmanuel Guzman, Ian McGraw, James Qin, Krzysztof Choromanski, Qiao Liang, Robert David, Ruoming Pang, Shuo-Yiin Chang, Trevor Strohman, W. Ronny Huang, Wei Han 0002, Yonghui Wu, Yu Zhang 0033, 
Improving The Latency And Quality Of Cascaded Encoders.

Interspeech2022 Shuo-Yiin Chang, Bo Li 0028, Tara N. Sainath, Chao Zhang, Trevor Strohman, Qiao Liang, Yanzhang He
Turn-Taking Prediction for Natural Conversational Speech.

Interspeech2022 Shaojin Ding, Phoenix Meadowlark, Yanzhang He, Lukasz Lew, Shivani Agrawal, Oleg Rybakov, 
4-bit Conformer with Native Quantization Aware Training for Speech Recognition.

Interspeech2022 Shaojin Ding, Rajeev Rikhye, Qiao Liang, Yanzhang He, Quan Wang, Arun Narayanan, Tom O'Malley, Ian McGraw, 
Personal VAD 2.0: Optimizing Personal Voice Activity Detection for On-Device Speech Recognition.

Interspeech2022 Shaojin Ding, Weiran Wang, Ding Zhao, Tara N. Sainath, Yanzhang He, Robert David, Rami Botros, Xin Wang 0016, Rina Panigrahy, Qiao Liang, Dongseong Hwang, Ian McGraw, Rohit Prabhavalkar, Trevor Strohman, 
A Unified Cascaded Encoder ASR Model for Dynamic Model Sizes.

Interspeech2022 Ke Hu, Tara N. Sainath, Yanzhang He, Rohit Prabhavalkar, Trevor Strohman, Sepand Mavandadi, Weiran Wang, 
Improving Deliberation by Text-Only and Semi-Supervised Training.

Interspeech2022 Bo Li 0028, Tara N. Sainath, Ruoming Pang, Shuo-Yiin Chang, Qiumin Xu, Trevor Strohman, Vince Chen, Qiao Liang, Heguang Liu, Yanzhang He, Parisa Haghani, Sameer Bidichandani, 
A Language Agnostic Multilingual Streaming On-Device ASR System.

Interspeech2022 Weiran Wang, Tongzhou Chen, Tara N. Sainath, Ehsan Variani, Rohit Prabhavalkar, W. Ronny Huang, Bhuvana Ramabhadran, Neeraj Gaur, Sepand Mavandadi, Cal Peyser, Trevor Strohman, Yanzhang He, David Rybach, 
Improving Rare Word Recognition with LM-aware MWER Training.

ICASSP2021 Bo Li 0028, Anmol Gulati, Jiahui Yu, Tara N. Sainath, Chung-Cheng Chiu, Arun Narayanan, Shuo-Yiin Chang, Ruoming Pang, Yanzhang He, James Qin, Wei Han 0002, Qiao Liang, Yu Zhang 0033, Trevor Strohman, Yonghui Wu, 
A Better and Faster end-to-end Model for Streaming ASR.

ICASSP2021 Qiujia Li, David Qiu, Yu Zhang 0033, Bo Li 0028, Yanzhang He, Philip C. Woodland, Liangliang Cao, Trevor Strohman, 
Confidence Estimation for Attention-Based Sequence-to-Sequence Models for Speech Recognition.

ICASSP2021 David Qiu, Qiujia Li, Yanzhang He, Yu Zhang 0033, Bo Li 0028, Liangliang Cao, Rohit Prabhavalkar, Deepti Bhatia, Wei Li 0133, Ke Hu, Tara N. Sainath, Ian McGraw, 
Learning Word-Level Confidence for Subword End-To-End ASR.

ICASSP2021 Jiahui Yu, Chung-Cheng Chiu, Bo Li 0028, Shuo-Yiin Chang, Tara N. Sainath, Yanzhang He, Arun Narayanan, Wei Han 0002, Anmol Gulati, Yonghui Wu, Ruoming Pang, 
FastEmit: Low-Latency Streaming ASR with Sequence-Level Emission Regularization.

Interspeech2021 Rami Botros, Tara N. Sainath, Robert David, Emmanuel Guzman, Wei Li 0133, Yanzhang He
Tied & Reduced RNN-T Decoder.

Interspeech2021 David Qiu, Yanzhang He, Qiujia Li, Yu Zhang 0033, Liangliang Cao, Ian McGraw, 
Multi-Task Learning for End-to-End ASR Word and Utterance Confidence with Deletion Prediction.

Interspeech2021 Rajeev Rikhye, Quan Wang, Qiao Liang, Yanzhang He, Ding Zhao, Yiteng Huang, Arun Narayanan, Ian McGraw, 
Personalized Keyphrase Detection Using Speaker and Environment Information.

Interspeech2021 Tara N. Sainath, Yanzhang He, Arun Narayanan, Rami Botros, Ruoming Pang, David Rybach, Cyril Allauzen, Ehsan Variani, James Qin, Quoc-Nam Le-The, Shuo-Yiin Chang, Bo Li 0028, Anmol Gulati, Jiahui Yu, Chung-Cheng Chiu, Diamantino Caseiro, Wei Li 0133, Qiao Liang, Pat Rondon, 
An Efficient Streaming Non-Recurrent On-Device End-to-End Model with Improvements to Rare-Word Modeling.

ICASSP2020 Bo Li 0028, Shuo-Yiin Chang, Tara N. Sainath, Ruoming Pang, Yanzhang He, Trevor Strohman, Yonghui Wu, 
Towards Fast and Accurate Streaming End-To-End ASR.

ICASSP2020 Tara N. Sainath, Yanzhang He, Bo Li 0028, Arun Narayanan, Ruoming Pang, Antoine Bruguier, Shuo-Yiin Chang, Wei Li 0133, Raziel Alvarez, Zhifeng Chen, Chung-Cheng Chiu, David Garcia, Alexander Gruenstein, Ke Hu, Anjuli Kannan, Qiao Liang, Ian McGraw, Cal Peyser, Rohit Prabhavalkar, Golan Pundak, David Rybach, Yuan Shangguan, Yash Sheth, Trevor Strohman, Mirkó Visontai, Yonghui Wu, Yu Zhang 0033, Ding Zhao, 
A Streaming On-Device End-To-End Model Surpassing Server-Side Conventional Model Quality and Latency.

#153  | Shujie Liu 0001 | Google Scholar   DBLP
VenuesInterspeech: 12ICASSP: 9AAAI: 3ACL: 2ICML: 1
Years2022: 112021: 82020: 72019: 1
ISCA Sectionnovel models and training methods for asr: 3source separation: 3speaker and language recognition: 1multi- and cross-lingual asr, other topics in asr: 1asr model training and strategies: 1streaming asr: 1asr neural network architectures: 1speech synthesis: 1
IEEE Keywordspeaker recognition: 5speech recognition: 5natural language processing: 4transformer: 4speaker verification: 2speech enhancement: 2self supervised learning: 2speech separation: 2source separation: 2recurrent neural nets: 2representation learning: 1unsupervised learning: 1image representation: 1self supervised pretrain: 1text analysis: 1speaker identification: 1robust automatic speech recognition: 1supervised learning: 1automatic speech recognition: 1transformer transducer: 1configurable multilingual model: 1multilingual speech recognition: 1signal representation: 1multi channel microphone: 1deep learning (artificial intelligence): 1transducer: 1decoding: 1encoding: 1real time decoding: 1conformer: 1multi speaker asr: 1continuous speech separation: 1filtering theory: 1audio signal processing: 1speaker diarization: 1system fusion: 1
Most Publications2022: 312021: 292020: 232018: 152019: 12

Affiliations
Microsoft Research Asia, Beijing, China
URLs

ICASSP2022 Zhengyang Chen, Sanyuan Chen, Yu Wu 0012, Yao Qian, Chengyi Wang 0002, Shujie Liu 0001, Yanmin Qian, Michael Zeng 0001, 
Large-Scale Self-Supervised Speech Representation Learning for Automatic Speaker Verification.

ICASSP2022 Rui Wang, Junyi Ao, Long Zhou, Shujie Liu 0001, Zhihua Wei, Tom Ko, Qing Li, Yu Zhang 0006, 
Multi-View Self-Attention Based Transformer for Speaker Recognition.

ICASSP2022 Heming Wang, Yao Qian, Xiaofei Wang 0009, Yiming Wang, Chengyi Wang 0002, Shujie Liu 0001, Takuya Yoshioka, Jinyu Li 0001, DeLiang Wang, 
Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction.

ICASSP2022 Chengyi Wang 0002, Yu Wu 0012, Sanyuan Chen, Shujie Liu 0001, Jinyu Li 0001, Yao Qian, Zhenglu Yang, 
Improving Self-Supervised Learning for Speech Recognition with Intermediate Layer Supervision.

ICASSP2022 Long Zhou, Jinyu Li 0001, Eric Sun, Shujie Liu 0001
A Configurable Multilingual Model is All You Need to Recognize All Languages.

Interspeech2022 Junyi Ao, Ziqiang Zhang, Long Zhou, Shujie Liu 0001, Haizhou Li 0001, Tom Ko, Lirong Dai 0001, Jinyu Li 0001, Yao Qian, Furu Wei, 
Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data.

Interspeech2022 Sanyuan Chen, Yu Wu 0012, Chengyi Wang 0002, Shujie Liu 0001, Zhuo Chen 0006, Peidong Wang, Gang Liu, Jinyu Li 0001, Jian Wu 0027, Xiangzhan Yu, Furu Wei, 
Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?

Interspeech2022 Shuo Ren, Shujie Liu 0001, Yu Wu 0012, Long Zhou, Furu Wei, 
Speech Pre-training with Acoustic Piece.

Interspeech2022 Chengyi Wang 0002, Yiming Wang, Yu Wu 0012, Sanyuan Chen, Jinyu Li 0001, Shujie Liu 0001, Furu Wei, 
Supervision-Guided Codebooks for Masked Prediction in Speech Pre-training.

Interspeech2022 Wangyou Zhang, Zhuo Chen 0006, Naoyuki Kanda, Shujie Liu 0001, Jinyu Li 0001, Sefik Emre Eskimez, Takuya Yoshioka, Xiong Xiao, Zhong Meng, Yanmin Qian, Furu Wei, 
Separating Long-Form Speech with Group-wise Permutation Invariant Training.

ACL2022 Junyi Ao, Rui Wang, Long Zhou, Chengyi Wang 0002, Shuo Ren, Yu Wu 0012, Shujie Liu 0001, Tom Ko, Qing Li, Yu Zhang 0006, Zhihua Wei, Yao Qian, Jinyu Li 0001, Furu Wei, 
SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing.

ICASSP2021 Sanyuan Chen, Yu Wu 0012, Zhuo Chen 0006, Takuya Yoshioka, Shujie Liu 0001, Jin-Yu Li 0001, Xiangzhan Yu, 
Don't Shoot Butterfly with Rifles: Multi-Channel Continuous Speech Separation with Early Exit Transformer.

ICASSP2021 Xie Chen 0001, Yu Wu 0012, Zhenghao Wang, Shujie Liu 0001, Jinyu Li 0001, 
Developing Real-Time Streaming Transformer Transducer for Speech Recognition on Large-Scale Dataset.

ICASSP2021 Sanyuan Chen, Yu Wu 0012, Zhuo Chen 0006, Jian Wu 0027, Jinyu Li 0001, Takuya Yoshioka, Chengyi Wang 0002, Shujie Liu 0001, Ming Zhou 0001, 
Continuous Speech Separation with Conformer.

ICASSP2021 Xiong Xiao, Naoyuki Kanda, Zhuo Chen 0006, Tianyan Zhou, Takuya Yoshioka, Sanyuan Chen, Yong Zhao 0008, Gang Liu, Yu Wu 0012, Jian Wu 0027, Shujie Liu 0001, Jinyu Li 0001, Yifan Gong 0001, 
Microsoft Speaker Diarization System for the Voxceleb Speaker Recognition Challenge 2020.

Interspeech2021 Sanyuan Chen, Yu Wu 0012, Zhuo Chen 0006, Jian Wu 0027, Takuya Yoshioka, Shujie Liu 0001, Jinyu Li 0001, Xiangzhan Yu, 
Ultra Fast Speech Separation Model with Teacher Student Learning.

Interspeech2021 Eric Sun, Jinyu Li 0001, Zhong Meng, Yu Wu 0012, Jian Xue, Shujie Liu 0001, Yifan Gong 0001, 
Improving Multilingual Transformer Transducer Models by Reducing Language Confusions.

Interspeech2021 Jian Wu 0027, Zhuo Chen 0006, Sanyuan Chen, Yu Wu 0012, Takuya Yoshioka, Naoyuki Kanda, Shujie Liu 0001, Jinyu Li 0001, 
Investigation of Practical Aspects of Single Channel Speech Separation for ASR.

ICML2021 Chengyi Wang 0002, Yu Wu 0012, Yao Qian, Ken'ichi Kumatani, Shujie Liu 0001, Furu Wei, Michael Zeng 0001, Xuedong Huang 0001, 
UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data.

Interspeech2020 Chengyi Wang 0002, Yu Wu 0012, Yujiao Du, Jinyu Li 0001, Shujie Liu 0001, Liang Lu 0001, Shuo Ren, Guoli Ye, Sheng Zhao, Ming Zhou 0001, 
Semantic Mask for Transformer Based End-to-End Speech Recognition.

#154  | Reinhold Häb-Umbach | Google Scholar   DBLP
VenuesInterspeech: 18ICASSP: 6TASLP: 1SpeechComm: 1
Years2022: 32021: 22020: 62019: 62018: 32017: 32016: 3
ISCA Sectionsource separation: 2far-field speech recognition: 2speaker embedding and diarization: 1voice conversion and adaptation: 1the fearless steps challenge phase-02: 1monaural source separation: 1diarization: 1speech enhancement: 1privacy in speech and audio interfaces: 1distant asr: 1zero-resource speech recognition: 1multi-channel speech enhancement: 1speech and audio segmentation and classification: 1special session: 1speech enhancement and noise reduction: 1speech enhancement and applications: 1
IEEE Keywordsource separation: 5blind source separation: 4reverberation: 4array signal processing: 3speech enhancement: 3speech recognition: 3optimisation: 2dereverberation: 2frequency domain analysis: 2time domain analysis: 2audio signal processing: 2backpropagation: 2speech separation: 2speaker recognition: 2complex backpropagation: 1transfer functions: 1signal to distortion ratio: 1convolution: 1multi channel source separation: 1acoustic beamforming: 1maximum likelihood estimation: 1beamforming: 1automatic speech recognition: 1filtering theory: 1microphone array: 1mean square error methods: 1multichannel source separation: 1robust automatic speech recognition: 1convolutional neural nets: 1time domain: 1hidden markov models: 1joint training: 1multi speaker speech recognition: 1computational complexity: 1end to end speech recognition: 1iterative methods: 1joint optimization: 1least squares approximations: 1robust asr: 1source counting: 1neural network: 1online processing: 1meeting diarization: 1clustering: 1directional statistics: 1transient response: 1complex hypersphere: 1sparseness: 1
Most Publications2021: 252019: 232020: 182022: 162013: 15

Affiliations
University of Paderborn, Department of Electrical Engineering and Information Technology, Germany

Interspeech2022 Christoph Böddeker, Tobias Cord-Landwehr, Thilo von Neumann, Reinhold Haeb-Umbach
An Initialization Scheme for Meeting Separation with Spatial Mixture Models.

Interspeech2022 Keisuke Kinoshita, Thilo von Neumann, Marc Delcroix, Christoph Böddeker, Reinhold Haeb-Umbach
Utterance-by-utterance overlap-aware neural diarization with Graph-PIT.

Interspeech2022 Michael Kuhlmann, Fritz Seebauer, Janek Ebbers, Petra Wagner, Reinhold Haeb-Umbach
Investigation into Target Speaking Rate Adaptation for Voice Conversion.

ICASSP2021 Christoph Böddeker, Wangyou Zhang, Tomohiro Nakatani, Keisuke Kinoshita, Tsubasa Ochiai, Marc Delcroix, Naoyuki Kamo, Yanmin Qian, Reinhold Haeb-Umbach
Convolutive Transfer Function Invariant SDR Training Criteria for Multi-Channel Reverberant Speech Separation.

Interspeech2021 Thilo von Neumann, Keisuke Kinoshita, Christoph Böddeker, Marc Delcroix, Reinhold Haeb-Umbach
Graph-PIT: Generalized Permutation Invariant Training for Continuous Separation of Arbitrary Numbers of Speakers.

TASLP2020 Tomohiro Nakatani, Christoph Böddeker, Keisuke Kinoshita, Rintaro Ikeshita, Marc Delcroix, Reinhold Haeb-Umbach
Jointly Optimal Denoising, Dereverberation, and Source Separation.

ICASSP2020 Jens Heitkaemper, Darius Jakobeit, Christoph Böddeker, Lukas Drude, Reinhold Haeb-Umbach
Demystifying TasNet: A Dissecting Approach.

ICASSP2020 Thilo von Neumann, Keisuke Kinoshita, Lukas Drude, Christoph Böddeker, Marc Delcroix, Tomohiro Nakatani, Reinhold Haeb-Umbach
End-to-End Training of Time Domain Audio Separation and Recognition.

Interspeech2020 Jens Heitkaemper, Joerg Schmalenstroeer, Reinhold Haeb-Umbach
Statistical and Neural Network Based Speech Activity Detection in Non-Stationary Acoustic Environments.

Interspeech2020 Keisuke Kinoshita, Thilo von Neumann, Marc Delcroix, Tomohiro Nakatani, Reinhold Haeb-Umbach
Multi-Path RNN for Hierarchical Modeling of Long Sequential Data and its Application to Speaker Stream Separation.

Interspeech2020 Thilo von Neumann, Christoph Böddeker, Lukas Drude, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani, Reinhold Haeb-Umbach
Multi-Talker ASR for an Unknown Number of Sources: Joint Training of Source Counting, Separation and ASR.

ICASSP2019 Jahn Heymann, Lukas Drude, Reinhold Haeb-Umbach, Keisuke Kinoshita, Tomohiro Nakatani, 
Joint Optimization of Neural Network-based WPE Dereverberation and Acoustic Model for Robust Online ASR.

ICASSP2019 Thilo von Neumann, Keisuke Kinoshita, Marc Delcroix, Shoko Araki, Tomohiro Nakatani, Reinhold Haeb-Umbach
All-neural Online Source Separation, Counting, and Diarization for Meeting Analysis.

Interspeech2019 Lukas Drude, Jahn Heymann, Reinhold Haeb-Umbach
Unsupervised Training of Neural Mask-Based Beamforming.

Interspeech2019 Naoyuki Kanda, Christoph Böddeker, Jens Heitkaemper, Yusuke Fujita, Shota Horiguchi, Kenji Nagamatsu, Reinhold Haeb-Umbach
Guided Source Separation Meets a Strong ASR Backend: Hitachi/Paderborn University Joint Investigation for Dinner Party ASR.

Interspeech2019 Juan M. Martín-Doñas, Jens Heitkaemper, Reinhold Haeb-Umbach, Angel M. Gomez, Antonio M. Peinado, 
Multi-Channel Block-Online Source Extraction Based on Utterance Adaptation.

Interspeech2019 Alexandru Nelus, Janek Ebbers, Reinhold Haeb-Umbach, Rainer Martin 0001, 
Privacy-Preserving Variational Information Feature Extraction for Domestic Activity Monitoring versus Speaker Identification.

SpeechComm2018 Vladimir Despotovic, Oliver Walter, Reinhold Haeb-Umbach
Machine learning techniques for semantic analysis of dysarthric speech: An experimental study.

Interspeech2018 Lukas Drude, Christoph Böddeker, Jahn Heymann, Reinhold Haeb-Umbach, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani, 
Integrating Neural Network Based Beamforming and Weighted Prediction Error Dereverberation.

Interspeech2018 Thomas Glarner, Patrick Hanebrink, Janek Ebbers, Reinhold Haeb-Umbach
Full Bayesian Hidden Markov Model Variational Autoencoder for Acoustic Unit Discovery.

#155  | Ariya Rastrow | Google Scholar   DBLP
VenuesInterspeech: 17ICASSP: 8KDD: 1
Years2022: 72021: 102020: 32019: 22018: 22017: 12016: 1
ISCA Sectionspoken language understanding: 3resource-constrained asr: 3applications in transcription, education and learning: 1neural network training methods for asr: 1multi- and cross-lingual asr, other topics in asr: 1self-supervision and semi-supervision for neural asr training: 1computational resource constrained speech recognition: 1neural networks for language modeling: 1speech synthesis: 1syllabification, rhythm, and voice activity detection: 1language modeling: 1source separation and voice activity detection: 1spoken documents, spoken understanding and semantic analysis: 1
IEEE Keywordspeech recognition: 8natural language processing: 4recurrent neural nets: 3end to end: 2automatic speech recognition: 2second pass rescoring: 2language modeling: 2optimisation: 2decoding: 1latency: 1personalization: 1cache storage: 1streaming: 1bert: 1minimum wer training: 1pretrained model: 1masked language model: 1multi accent asr: 1domain adversarial training: 1end to end asr: 1accent invariance: 1rnn transducer: 1domain adaptation: 1error analysis: 1recurrent neural network transducer (rnn t): 1inference optimization: 1audio signal processing: 1on device speech recognition: 1recurrent neural network transducer: 1multilingual: 1language identification: 1joint modeling: 1code switching: 1neural interfaces: 1reinforce: 1entropy: 1multitask training: 1spoken language understanding: 1hidden markov models: 1minimum word error rate: 1attention: 1
Most Publications2021: 312022: 132020: 112018: 92023: 7

Affiliations
URLs

ICASSP2022 Anastasios Alexandridis, Grant P. Strimel, Ariya Rastrow, Pavel Kveton, Jon Webb, Maurizio Omologo, Siegfried Kunzmann, Athanasios Mouchtaris, 
Caching Networks: Capitalizing on Common Speech for ASR.

ICASSP2022 Liyan Xu, Yile Gu, Jari Kolehmainen, Haidar Khan, Ankur Gandhe, Ariya Rastrow, Andreas Stolcke, Ivan Bulyko, 
RescoreBERT: Discriminative Speech Recognition Rescoring With Bert.

Interspeech2022 Phani Sankar Nidadavolu, Na Xu, Nick Jutila, Ravi Teja Gadde, Aswarth Abhilash Dara, Joseph Savold, Sapan Patel, Aaron Hoff, Veerdhawal Pande, Kevin Crews, Ankur Gandhe, Ariya Rastrow, Roland Maas, 
RefTextLAS: Reference Text Biased Listen, Attend, and Spell Model For Accurate Reading Evaluation.

Interspeech2022 Anirudh Raju, Milind Rao, Gautam Tiwari, Pranav Dheram, Bryan Anderson, Zhe Zhang, Chul Lee, Bach Bui, Ariya Rastrow
On joint training with interfaces for spoken language understanding.

Interspeech2022 Yi Xie, Jonathan Macoskey, Martin Radfar, Feng-Ju Chang, Brian King, Ariya Rastrow, Athanasios Mouchtaris, Grant P. Strimel, 
Compute Cost Amortized Transformer for Streaming ASR.

Interspeech2022 Kai Zhen, Hieu Duy Nguyen, Raviteja Chinta, Nathan Susanj, Athanasios Mouchtaris, Tariq Afzal, Ariya Rastrow
Sub-8-Bit Quantization Aware Training for 8-Bit Neural Network Accelerator with On-Device Speech Recognition.

KDD2022 Gopinath Chennupati, Milind Rao, Gurpreet Chadha, Aaron Eakin, Anirudh Raju, Gautam Tiwari, Anit Kumar Sahu, Ariya Rastrow, Jasha Droppo, Andy Oberlin, Buddha Nandanoor, Prahalad Venkataramanan, Zheng Wu, Pankaj Sitpure, 
ILASR: Privacy-Preserving Incremental Learning for Automatic Speech Recognition at Production Scale.

ICASSP2021 Hu Hu, Xuesong Yang, Zeynab Raeesy, Jinxi Guo, Gokce Keskin, Harish Arsikere, Ariya Rastrow, Andreas Stolcke, Roland Maas, 
REDAT: Accent-Invariant Representation for End-To-End ASR by Domain Adversarial Training with Relabeling.

ICASSP2021 Linda Liu, Yile Gu, Aditya Gourav, Ankur Gandhe, Shashank Kalmane, Denis Filimonov, Ariya Rastrow, Ivan Bulyko, 
Domain-Aware Neural Language Models for Speech Recognition.

ICASSP2021 Jon Macoskey, Grant P. Strimel, Ariya Rastrow
Bifocal Neural ASR: Exploiting Keyword Spotting for Inference Optimization.

ICASSP2021 Surabhi Punjabi, Harish Arsikere, Zeynab Raeesy, Chander Chandak, Nikhil Bhave, Ankish Bansal, Markus Müller, Sergio Murillo, Ariya Rastrow, Andreas Stolcke, Jasha Droppo, Sri Garimella, Roland Maas, Mat Hans, Athanasios Mouchtaris, Siegfried Kunzmann, 
Joint ASR and Language Identification Using RNN-T: An Efficient Approach to Dynamic Language Switching.

ICASSP2021 Milind Rao, Pranav Dheram, Gautam Tiwari, Anirudh Raju, Jasha Droppo, Ariya Rastrow, Andreas Stolcke, 
DO as I Mean, Not as I Say: Sequence Loss Training for Spoken Language Understanding.

Interspeech2021 Jonathan Macoskey, Grant P. Strimel, Ariya Rastrow
Learning a Neural Diff for Speech Models.

Interspeech2021 Jonathan Macoskey, Grant P. Strimel, Jinru Su, Ariya Rastrow
Amortized Neural Networks for Low-Latency Speech Recognition.

Interspeech2021 Martin Radfar, Athanasios Mouchtaris, Siegfried Kunzmann, Ariya Rastrow
FANS: Fusing ASR and NLU for On-Device SLU.

Interspeech2021 Swayambhu Nath Ray, Minhua Wu, Anirudh Raju, Pegah Ghahremani, Raghavendra Bilgi, Milind Rao, Harish Arsikere, Ariya Rastrow, Andreas Stolcke, Jasha Droppo, 
Listen with Intent: Improving Speech Recognition with Audio-to-Intent Front-End.

Interspeech2021 Samik Sadhu, Di He, Che-Wei Huang, Sri Harish Mallidi, Minhua Wu, Ariya Rastrow, Andreas Stolcke, Jasha Droppo, Roland Maas, 
wav2vec-C: A Self-Supervised Model for Speech Representation Learning.

ICASSP2020 Ankur Gandhe, Ariya Rastrow
Audio-Attention Discriminative Language Model for ASR Rescoring.

Interspeech2020 Milind Rao, Anirudh Raju, Pranav Dheram, Bach Bui, Ariya Rastrow
Speech to Semantics: Improve ASR and NLU Jointly via All-Neural Interfaces.

Interspeech2020 Grant P. Strimel, Ariya Rastrow, Gautam Tiwari, Adrien Piérard, Jon Webb, 
Rescore in a Flash: Compact, Cache Efficient Hashing Data Structures for n-Gram Language Models.

#156  | Laurent Besacier | Google Scholar   DBLP
VenuesInterspeech: 19ICASSP: 6TASLP: 1
Years2022: 32021: 32020: 72019: 32018: 42017: 12016: 5
ISCA Sectionspeech synthesis: 3special session: 3spoken language understanding: 2speech translation and multilingual/multimodal learning: 2inclusive and fair speech technologies: 1miscellanous topics in asr: 1spoken language processing: 1the zero resource speech challenge 2020: 1spoken term detection: 1the zero resource speech challenge 2019: 1zero-resource speech recognition: 1show & tell session: 1speech translation and metadata for linguistic/discourse structure: 1
IEEE Keywordnatural language processing: 5text analysis: 2image retrieval: 2human computer interaction: 1image representation: 1automatic speech recognition: 1speech synthesis: 1unsupervised learning: 1computational linguistics: 1media corpus: 1data efficiency: 1grammars: 1joint learning: 1end to end slu: 1interactive systems: 1sequence to sequence models: 1speech recognition: 1recurrent neural networks.: 1information retrieval: 1word processing: 1cross lingual speech retrieval: 1grounded language learning: 1attention mechanism: 1convolution: 1large vocabulary continuous speech recognition: 1performance prediction: 1convolutional neural networks: 1feedforward neural nets: 1low resource asr: 1bayes methods: 1acoustic unit discovery: 1bayesian model: 1informative prior: 1multi modal data: 1unwritten languages: 1unsupervised unit discovery: 1machine translation: 1linguistics: 1iterative methods: 1optical character recognition: 1annotation propagation: 1clustering: 1ocr: 1pattern clustering: 1speaker recognition: 1speaker identification: 1active learning: 1
Most Publications2020: 362018: 272016: 262021: 252019: 22

Affiliations
URLs

Interspeech2022 Marcely Zanon Boito, Laurent Besacier, Natalia A. Tomashenko, Yannick Estève, 
A Study of Gender Impact in Self-supervised Models for Speech-to-Text Systems.

Interspeech2022 Valentin Pelloin, Franck Dary, Nicolas Hervé, Benoît Favre, Nathalie Camelin, Antoine Laurent, Laurent Besacier
ASR-Generated Text for Language Model Pre-training Applied to Speech Tasks.

Interspeech2022 Brooke Stephenson, Laurent Besacier, Laurent Girin, Thomas Hueber, 
BERT, can HE predict contrastive focus? Predicting and controlling prominence in neural TTS using a language model.

Interspeech2021 Solène Evain, Ha Nguyen, Hang Le, Marcely Zanon Boito, Salima Mdhaffar, Sina Alisamir, Ziyi Tong, Natalia A. Tomashenko, Marco Dinarelli, Titouan Parcollet, Alexandre Allauzen, Yannick Estève, Benjamin Lecouteux, François Portet, Solange Rossato, Fabien Ringeval, Didier Schwab, Laurent Besacier
LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech.

Interspeech2021 Ha Nguyen, Yannick Estève, Laurent Besacier
Impact of Encoding and Segmentation Strategies on End-to-End Simultaneous Speech Translation.

Interspeech2021 Brooke Stephenson, Thomas Hueber, Laurent Girin, Laurent Besacier
Alternate Endings: Improving Prosody for Incremental Neural TTS with Predicted Future Text Input.

TASLP2020 Odette Scharenborg, Lucas Ondel, Shruti Palaskar, Philip Arthur, Francesco Ciannella, Mingxing Du, Elin Larsen, Danny Merkx, Rachid Riad, Liming Wang, Emmanuel Dupoux, Laurent Besacier, Alan W. Black, Mark Hasegawa-Johnson, Florian Metze, Graham Neubig, Sebastian Stüker, Pierre Godard, Markus Müller 0001, 
Speech Technology for Unwritten Languages.

ICASSP2020 Marco Dinarelli, Nikita Kapoor, Bassam Jabaian, Laurent Besacier
A Data Efficient End-to-End Spoken Language Understanding Architecture.

Interspeech2020 Ewan Dunbar, Julien Karadayi, Mathieu Bernard, Xuan-Nga Cao, Robin Algayres, Lucas Ondel, Laurent Besacier, Sakriani Sakti, Emmanuel Dupoux, 
The Zero Resource Speech Challenge 2020: Discovering Discrete Subword and Word Units.

Interspeech2020 Maha Elbayad, Laurent Besacier, Jakob Verbeek, 
Efficient Wait-k Models for Simultaneous Machine Translation.

Interspeech2020 Ha Nguyen, Fethi Bougares, Natalia A. Tomashenko, Yannick Estève, Laurent Besacier
Investigating Self-Supervised Pre-Training for End-to-End Speech Translation.

Interspeech2020 Vaishali Pal, Fabien Guillot, Manish Shrivastava 0001, Jean-Michel Renders, Laurent Besacier
Modeling ASR Ambiguity for Neural Dialogue State Tracking.

Interspeech2020 Brooke Stephenson, Laurent Besacier, Laurent Girin, Thomas Hueber, 
What the Future Brings: Investigating the Impact of Lookahead for Incremental Neural TTS.

ICASSP2019 William N. Havard, Jean-Pierre Chevrot, Laurent Besacier
Models of Visually Grounded Speech Signal Pay Attention to Nouns: A Bilingual Experiment on English and Japanese.

Interspeech2019 Marcely Zanon Boito, Aline Villavicencio, Laurent Besacier
Empirical Evaluation of Sequence-to-Sequence Models for Word Discovery in Low-Resource Settings.

Interspeech2019 Ewan Dunbar, Robin Algayres, Julien Karadayi, Mathieu Bernard, Juan Benjumea, Xuan-Nga Cao, Lucie Miskic, Charlotte Dugrain, Lucas Ondel, Alan W. Black, Laurent Besacier, Sakriani Sakti, Emmanuel Dupoux, 
The Zero Resource Speech Challenge 2019: TTS Without T.

ICASSP2018 Zied Elloumi, Laurent Besacier, Olivier Galibert, Juliette Kahn, Benjamin Lecouteux, 
ASR Performance Prediction on Unseen Broadcast Programs Using Convolutional Neural Networks.

ICASSP2018 Lucas Ondel, Pierre Godard, Laurent Besacier, Elin Larsen, Mark Hasegawa-Johnson, Odette Scharenborg, Emmanuel Dupoux, Lukás Burget, François Yvon, Sanjeev Khudanpur, 
Bayesian Models for Unit Discovery on a Very Low Resource Language.

ICASSP2018 Odette Scharenborg, Laurent Besacier, Alan W. Black, Mark Hasegawa-Johnson, Florian Metze, Graham Neubig, Sebastian Stüker, Pierre Godard, Markus Müller 0001, Lucas Ondel, Shruti Palaskar, Philip Arthur, Francesco Ciannella, Mingxing Du, Elin Larsen, Danny Merkx, Rachid Riad, Liming Wang, Emmanuel Dupoux, 
Linguistic Unit Discovery from Multi-Modal Inputs in Unwritten Languages: Summary of the "Speaking Rosetta" JSALT 2017 Workshop.

Interspeech2018 Pierre Godard, Marcely Zanon Boito, Lucas Ondel, Alexandre Berard, François Yvon, Aline Villavicencio, Laurent Besacier
Unsupervised Word Segmentation from Speech with Attention.

#157  | Khe Chai Sim | Google Scholar   DBLP
VenuesInterspeech: 16ICASSP: 9TASLP: 1
Years2022: 52021: 22020: 12019: 32018: 52017: 32016: 7
ISCA Sectionacoustic model adaptation: 2language modeling and lexical modeling for asr: 1low-resource asr development: 1zero, low-resource and multi-modal speech recognition: 1multi-, cross-lingual and other topics in asr: 1self-supervision and semi-supervision for neural asr training: 1topics in asr: 1model adaptation for asr: 1far-field speech recognition: 1multi-lingual models and adaptation for asr: 1acoustic models for asr: 1robustness in speech processing: 1far-field, robustness and adaptation: 1robustness and adaptation: 1new trends in neural networks for speech recognition: 1
IEEE Keywordspeech recognition: 8recurrent neural nets: 5natural language processing: 4interpretability: 2speaker adaptation: 2speaker recognition: 2domain adaptation: 1semi supervised learning (artificial intelligence): 1self supervised learning: 1semi supervised learning: 1rnn t: 1optimisation: 1on device learning: 1low rank gradient: 1memory reduction: 1approximation theory: 1mobile computing: 1gradient methods: 1speech coding: 1mobile handsets: 1stimulated learning: 1sequence classification: 1connectionist temporal classification: 1activation regularisation: 1visualisation: 1neural network: 1recurrent state visualization: 1long short term memory: 1data visualisation: 1grapheme sequence models: 1sequence to sequence: 1multi dialect: 1adaptation: 1acoustic modeling: 1recurrent neural networks (rnns): 1student teacher training: 1entropy: 1long short term memory (lstm): 1speaker normalization: 1unsupervised learning: 1interpolation: 1deep neural networks: 1automatic speech recognition: 1signal representation: 1i vector: 1lstm rnns: 1speaking rate: 1speaker aware training: 1feedforward neural nets: 1
Most Publications2022: 162016: 142014: 122013: 112012: 11

Affiliations
URLs

ICASSP2022 Dongseong Hwang, Ananya Misra, Zhouyuan Huo, Nikhil Siddhartha, Shefali Garg, David Qiu, Khe Chai Sim, Trevor Strohman, Françoise Beaufays, Yanzhang He, 
Large-Scale ASR Domain Adaptation Using Self- and Semi-Supervised Learning.

Interspeech2022 Theresa Breiner, Swaroop Ramaswamy, Ehsan Variani, Shefali Garg, Rajiv Mathews, Khe Chai Sim, Kilol Gupta, Mingqing Chen, Lara McConnaughey, 
UserLibri: A Dataset for ASR Personalization Using Only Text.

Interspeech2022 Zhouyuan Huo, Dongseong Hwang, Khe Chai Sim, Shefali Garg, Ananya Misra, Nikhil Siddhartha, Trevor Strohman, Françoise Beaufays, 
Incremental Layer-Wise Self-Supervised Learning for Efficient Unsupervised Speech Domain Adaptation On Device.

Interspeech2022 Dongseong Hwang, Khe Chai Sim, Zhouyuan Huo, Trevor Strohman, 
Pseudo Label Is Better Than Human Label.

Interspeech2022 Golan Pundak, Tsendsuren Munkhdalai, Khe Chai Sim
On-the-fly ASR Corrections with Audio Exemplars.

Interspeech2021 Ananya Misra, Dongseong Hwang, Zhouyuan Huo, Shefali Garg, Nikhil Siddhartha, Arun Narayanan, Khe Chai Sim
A Comparison of Supervised and Unsupervised Pre-Training of End-to-End Models.

Interspeech2021 Khe Chai Sim, Angad Chandorkar, Fan Gao, Mason Chua, Tsendsuren Munkhdalai, Françoise Beaufays, 
Robust Continuous On-Device Personalization for Automatic Speech Recognition.

ICASSP2020 Mary Gooneratne, Khe Chai Sim, Petr Zadrazil, Andreas Kabel, Françoise Beaufays, Giovanni Motta, 
Low-Rank Gradient Approximation for Memory-Efficient on-Device Training of Deep Neural Network.

ICASSP2019 Yanzhang He, Tara N. Sainath, Rohit Prabhavalkar, Ian McGraw, Raziel Alvarez, Ding Zhao, David Rybach, Anjuli Kannan, Yonghui Wu, Ruoming Pang, Qiao Liang, Deepti Bhatia, Yuan Shangguan, Bo Li 0028, Golan Pundak, Khe Chai Sim, Tom Bagby, Shuo-Yiin Chang, Kanishka Rao, Alexander Gruenstein, 
Streaming End-to-end Speech Recognition for Mobile Devices.

ICASSP2019 Jahn Heymann, Khe Chai Sim, Bo Li 0028, 
Improving CTC Using Stimulated Learning for Sequence Modeling.

Interspeech2019 Khe Chai Sim, Petr Zadrazil, Françoise Beaufays, 
An Investigation into On-Device Personalization of End-to-End Automatic Speech Recognition Models.

TASLP2018 Chunyang Wu, Mark J. F. Gales, Anton Ragni, Penny Karanasou, Khe Chai Sim
Improving Interpretability and Regularization in Deep Learning.

ICASSP2018 Skanda Koppula, Khe Chai Sim, Kean K. Chin, 
Understanding Recurrent Neural State Using Memory Signatures.

ICASSP2018 Bo Li 0028, Tara N. Sainath, Khe Chai Sim, Michiel Bacchiani, Eugene Weinstein, Patrick Nguyen, Zhifeng Chen, Yanghui Wu, Kanishka Rao, 
Multi-Dialect Speech Recognition with a Single Sequence-to-Sequence Model.

ICASSP2018 Lahiru Samarakoon, Brian Mak, Khe Chai Sim
learning Effective Factorized Hidden Layer Bases Using Student-Teacher Training for LSTM Acoustic Model Adaptation.

Interspeech2018 Khe Chai Sim, Arun Narayanan, Ananya Misra, Anshuman Tripathi, Golan Pundak, Tara N. Sainath, Parisa Haghani, Bo Li 0028, Michiel Bacchiani, 
Domain Adaptation Using Factorized Hidden Layer for Robust Automatic Speech Recognition.

Interspeech2017 Bo Li 0028, Tara N. Sainath, Arun Narayanan, Joe Caroselli, Michiel Bacchiani, Ananya Misra, Izhak Shafran, Hasim Sak, Golan Pundak, Kean K. Chin, Khe Chai Sim, Ron J. Weiss, Kevin W. Wilson, Ehsan Variani, Chanwoo Kim 0001, Olivier Siohan, Mitchel Weintraub, Erik McDermott, Richard Rose, Matt Shannon, 
Acoustic Modeling for Google Home.

Interspeech2017 Lahiru Samarakoon, Brian Mak, Khe Chai Sim
Learning Factorized Transforms for Unsupervised Adaptation of LSTM-RNN Acoustic Models.

Interspeech2017 Khe Chai Sim, Arun Narayanan, 
An Efficient Phone N-Gram Forward-Backward Computation Using Dense Matrix Multiplication.

ICASSP2016 Lahiru Samarakoon, Khe Chai Sim
On combining i-vectors and discriminative adaptation methods for unsupervised speaker normalization in DNN acoustic models.

#158  | Arun Narayanan | Google Scholar   DBLP
VenuesInterspeech: 17ICASSP: 8TASLP: 1
Years2022: 72021: 62020: 22018: 32017: 62016: 2
ISCA Sectionfar-field speech recognition: 2privacy and security in speech communication: 1speech segmentation: 1single-channel speech enhancement: 1robust asr, and far-field/multi-talker asr: 1dereverberation and echo cancellation: 1self-supervision and semi-supervision for neural asr training: 1spoken term detection & voice search: 1streaming for asr/rnn transducers: 1noise robust and distant speech recognition: 1distant asr: 1acoustic model adaptation: 1noise robust and far-field asr: 1discriminative training for asr: 1acoustic models for asr: 1far-field speech processing: 1
IEEE Keywordspeech recognition: 7recurrent neural nets: 3speech coding: 2speaker recognition: 2conformer: 2natural language processing: 2optimisation: 2speech enhancement: 2text analysis: 1decoding: 1transducers: 1rnnt: 1two pass asr: 1long form asr: 1end to end asr: 1non streaming asr: 1model distillation: 1streaming asr: 1latency: 1cascaded encoders: 1rnn t: 1regression analysis: 1probability: 1vocabulary: 1deep neural network model: 1phase sensitive model spectral distortion model: 1far field speech recognition: 1phase distortion training: 1spectral distortion training: 1noise robust speech recognition: 1microphones: 1array signal processing: 1beamforming: 1spatial filters: 1direction of arrival estimation: 1channel bank filters: 1filtering theory: 1acoustic convolution: 1
Most Publications2021: 232022: 192020: 132017: 102018: 7

Affiliations
URLs

ICASSP2022 Ke Hu, Tara N. Sainath, Arun Narayanan, Ruoming Pang, Trevor Strohman, 
Transducer-Based Streaming Deliberation for Cascaded Encoders.

ICASSP2022 Tara N. Sainath, Yanzhang He, Arun Narayanan, Rami Botros, Weiran Wang, David Qiu, Chung-Cheng Chiu, Rohit Prabhavalkar, Alexander Gruenstein, Anmol Gulati, Bo Li 0028, David Rybach, Emmanuel Guzman, Ian McGraw, James Qin, Krzysztof Choromanski, Qiao Liang, Robert David, Ruoming Pang, Shuo-Yiin Chang, Trevor Strohman, W. Ronny Huang, Wei Han 0002, Yonghui Wu, Yu Zhang 0033, 
Improving The Latency And Quality Of Cascaded Encoders.

Interspeech2022 Ehsan Amid, Om Dipakbhai Thakkar, Arun Narayanan, Rajiv Mathews, Françoise Beaufays, 
Extracting Targeted Training Data from ASR Models, and How to Mitigate It.

Interspeech2022 Shaojin Ding, Rajeev Rikhye, Qiao Liang, Yanzhang He, Quan Wang, Arun Narayanan, Tom O'Malley, Ian McGraw, 
Personal VAD 2.0: Optimizing Personal Voice Activity Detection for On-Device Speech Recognition.

Interspeech2022 Yuma Koizumi, Shigeki Karita, Arun Narayanan, Sankaran Panchapagesan, Michiel Bacchiani, 
SNRi Target Training for Joint Speech Enhancement and Recognition.

Interspeech2022 Thomas R. O'Malley, Arun Narayanan, Quan Wang, 
A universally-deployable ASR frontend for joint acoustic echo cancellation, speech enhancement, and voice separation.

Interspeech2022 Sankaran Panchapagesan, Arun Narayanan, Turaj Zakizadeh Shabestary, Shuai Shao, Nathan Howard, Alex Park 0001, James Walker, Alexander Gruenstein, 
A Conformer-based Waveform-domain Neural Acoustic Echo Canceller Optimized for ASR Accuracy.

ICASSP2021 Thibault Doutre, Wei Han 0002, Min Ma, Zhiyun Lu, Chung-Cheng Chiu, Ruoming Pang, Arun Narayanan, Ananya Misra, Yu Zhang 0033, Liangliang Cao, 
Improving Streaming Automatic Speech Recognition with Non-Streaming Model Distillation on Unsupervised Data.

ICASSP2021 Bo Li 0028, Anmol Gulati, Jiahui Yu, Tara N. Sainath, Chung-Cheng Chiu, Arun Narayanan, Shuo-Yiin Chang, Ruoming Pang, Yanzhang He, James Qin, Wei Han 0002, Qiao Liang, Yu Zhang 0033, Trevor Strohman, Yonghui Wu, 
A Better and Faster end-to-end Model for Streaming ASR.

ICASSP2021 Jiahui Yu, Chung-Cheng Chiu, Bo Li 0028, Shuo-Yiin Chang, Tara N. Sainath, Yanzhang He, Arun Narayanan, Wei Han 0002, Anmol Gulati, Yonghui Wu, Ruoming Pang, 
FastEmit: Low-Latency Streaming ASR with Sequence-Level Emission Regularization.

Interspeech2021 Ananya Misra, Dongseong Hwang, Zhouyuan Huo, Shefali Garg, Nikhil Siddhartha, Arun Narayanan, Khe Chai Sim, 
A Comparison of Supervised and Unsupervised Pre-Training of End-to-End Models.

Interspeech2021 Rajeev Rikhye, Quan Wang, Qiao Liang, Yanzhang He, Ding Zhao, Yiteng Huang, Arun Narayanan, Ian McGraw, 
Personalized Keyphrase Detection Using Speaker and Environment Information.

Interspeech2021 Tara N. Sainath, Yanzhang He, Arun Narayanan, Rami Botros, Ruoming Pang, David Rybach, Cyril Allauzen, Ehsan Variani, James Qin, Quoc-Nam Le-The, Shuo-Yiin Chang, Bo Li 0028, Anmol Gulati, Jiahui Yu, Chung-Cheng Chiu, Diamantino Caseiro, Wei Li 0133, Qiao Liang, Pat Rondon, 
An Efficient Streaming Non-Recurrent On-Device End-to-End Model with Improvements to Rare-Word Modeling.

ICASSP2020 Tara N. Sainath, Yanzhang He, Bo Li 0028, Arun Narayanan, Ruoming Pang, Antoine Bruguier, Shuo-Yiin Chang, Wei Li 0133, Raziel Alvarez, Zhifeng Chen, Chung-Cheng Chiu, David Garcia, Alexander Gruenstein, Ke Hu, Anjuli Kannan, Qiao Liang, Ian McGraw, Cal Peyser, Rohit Prabhavalkar, Golan Pundak, David Rybach, Yuan Shangguan, Yash Sheth, Trevor Strohman, Mirkó Visontai, Yonghui Wu, Yu Zhang 0033, Ding Zhao, 
A Streaming On-Device End-To-End Model Surpassing Server-Side Conventional Model Quality and Latency.

Interspeech2020 Antoine Bruguier, Ananya Misra, Arun Narayanan, Rohit Prabhavalkar, 
Anti-Aliasing Regularization in Stacking Layers.

ICASSP2018 Chanwoo Kim 0001, Tara N. Sainath, Arun Narayanan, Ananya Misra, Rajeev C. Nongpiur, Michiel Bacchiani, 
Spectral Distortion Model for Training Phase-Sensitive Deep-Neural Networks for Far-Field Speech Recognition.

Interspeech2018 Chanwoo Kim 0001, Ehsan Variani, Arun Narayanan, Michiel Bacchiani, 
Efficient Implementation of the Room Simulator for Training Deep Neural Network Acoustic Models.

Interspeech2018 Khe Chai Sim, Arun Narayanan, Ananya Misra, Anshuman Tripathi, Golan Pundak, Tara N. Sainath, Parisa Haghani, Bo Li 0028, Michiel Bacchiani, 
Domain Adaptation Using Factorized Hidden Layer for Robust Automatic Speech Recognition.

TASLP2017 Tara N. Sainath, Ron J. Weiss, Kevin W. Wilson, Bo Li 0028, Arun Narayanan, Ehsan Variani, Michiel Bacchiani, Izhak Shafran, Andrew W. Senior, Kean K. Chin, Ananya Misra, Chanwoo Kim 0001, 
Multichannel Signal Processing With Deep Neural Networks for Automatic Speech Recognition.

Interspeech2017 Joe Caroselli, Izhak Shafran, Arun Narayanan, Richard Rose, 
Adaptive Multichannel Dereverberation for Automatic Speech Recognition.

#159  | Visar Berisha | Google Scholar   DBLP
VenuesInterspeech: 17ICASSP: 9
Years2022: 22021: 32020: 32019: 62018: 42017: 42016: 4
ISCA Sectionvoice, speech and hearing disorders: 2speech and language in health: 1pathological speech analysis: 1assessment of pathological speech and language: 1speech enhancement and intelligibility: 1asr neural network architectures: 1voice and hearing disorders: 1emotion and personality in conversation: 1voice quality characterization for clinical voice assessment: 1speech and language analytics for mental health: 1speech and audio classification: 1speaker verification using neural network methods: 1applications in education and learning: 1spoken dialogue systems and conversational analysis: 1special session: 1behavioral signal processing and speaker state and traits analytics: 1
IEEE Keywordmedical signal processing: 4hypernasality: 3recurrent neural networks: 3speech: 3medical disorders: 3signal classification: 2speech recognition: 2dysarthric speech: 2dysarthria: 2cleft palate: 1attention: 1pattern classification: 1cleft lip and palate: 1velopharyngeal dysfunction: 1bioacoustics: 1deep neural network: 1tremor: 1diseases: 1amyotrophic lateral sclerosis (als): 1patient diagnosis: 1patient treatment: 1neurophysiology: 1velopha ryngeal dysfunction: 1automatic speech recognition: 1adversarial training: 1data augmentation: 1voice conversion: 1data analysis: 1clinical applications: 1phonological features: 1statistics: 1biomedical measurement: 1spectral analysis: 1self organising feature maps: 1regression analysis: 1probability: 1objective assessment: 1speech pathology: 1divergence: 1distribution regression: 1recurrent neural nets: 1clinical tool: 1speaking rate estimation: 1feature selection: 1divergence measures: 1bayes methods: 1non parametric estimator: 1multi class classification: 1bayes error rate: 1
Most Publications2019: 142018: 132020: 122017: 102016: 10

Affiliations
URLs

Interspeech2022 Visar Berisha, Chelsea Krantsevich, Gabriela Stegmann, Shira Hahn, Julie Liss, 
Are reported accuracies in the clinical speech machine learning literature overoptimistic?

Interspeech2022 Kelvin Tran, Lingfeng Xu, Gabriela Stegmann, Julie Liss, Visar Berisha, Rene Utianski, 
Investigating the Impact of Speech Compression on the Acoustics of Dysarthric Speech.

ICASSP2021 Vikram C. Mathad, Nancy Scherer, Kathy Chapman, Julie Liss, Visar Berisha
An Attention Model for Hypernasality Prediction in Children with Cleft Palate.

Interspeech2021 Vikram C. Mathad, Tristan J. Mahr, Nancy Scherer, Kathy Chapman, Katherine C. Hustad, Julie Liss, Visar Berisha
The Impact of Forced-Alignment Errors on Automatic Pronunciation Evaluation.

Interspeech2021 Jianwei Zhang, Suren Jayasuriya, Visar Berisha
Restoring Degraded Speech via a Modified Diffusion Model.

ICASSP2020 Vikram C. Mathad, Kathy Chapman, Julie Liss, Nancy Scherer, Visar Berisha
Deep Learning Based Prediction of Hypernasality for Clinical Applications.

Interspeech2020 Deepak Kadetotad, Jian Meng, Visar Berisha, Chaitali Chakrabarti, Jae-sun Seo, 
Compressing LSTM Networks with Hierarchical Coarse-Grain Sparsity.

Interspeech2020 Meredith Moore, Piyush Papreja, Michael Saxon, Visar Berisha, Sethuraman Panchanathan, 
UncommonVoice: A Crowdsourced Dataset of Dysphonic Speech.

ICASSP2019 Jacob Peplinski, Visar Berisha, Julie Liss, Shira Hahn, Jeremy Shefner, Seward B. Rutkove, Kristin Qi, Kerisa Shelton, 
Objective Assessment of Vocal Tremor.

ICASSP2019 Michael Saxon, Julie Liss, Visar Berisha
Objective Measures of Plosive Nasalization in Hypernasal Speech.

Interspeech2019 Nichola Lubold, Stephanie A. Borrie, Tyson S. Barrett, Megan M. Willi, Visar Berisha
Do Conversational Partners Entrain on Articulatory Precision?

Interspeech2019 Meredith Moore, Michael Saxon, Hemanth Venkateswara, Visar Berisha, Sethuraman Panchanathan, 
Say What? A Dataset for Exploring the Error Patterns That Two ASR Engines Make.

Interspeech2019 Rohit Voleti, Stephanie Woolridge, Julie M. Liss, Melissa Milanovic, Christopher R. Bowie, Visar Berisha
Objective Assessment of Social Skills Using Automated Language Analysis for Identification of Schizophrenia and Bipolar Disorder.

Interspeech2019 Yan Xiong, Visar Berisha, Chaitali Chakrabarti, 
Residual + Capsule Networks (ResCap) for Simultaneous Single-Channel Overlapped Keyword Recognition.

ICASSP2018 Yishan Jiao, Ming Tu, Visar Berisha, Julie Liss, 
Simulating Dysarthric Speech for Training Data Augmentation in Clinical Speech Applications.

Interspeech2018 Huan Song, Megan M. Willi, Jayaraman J. Thiagarajan, Visar Berisha, Andreas Spanias, 
Triplet Network with Attention for Speaker Diarization.

Interspeech2018 Ming Tu, Anna Grabek, Julie Liss, Visar Berisha
Investigating the Role of L1 in Automatic Pronunciation Evaluation of L2 Speech.

Interspeech2018 Megan M. Willi, Stephanie A. Borrie, Tyson S. Barrett, Ming Tu, Visar Berisha
A Discriminative Acoustic-Prosodic Approach for Measuring Local Entrainment.

ICASSP2017 Yishan Jiao, Visar Berisha, Julie Liss, 
Interpretable phonological features for clinical applications.

ICASSP2017 Ming Tu, Visar Berisha, Julie Liss, 
Objective assessment of pathological speech using distribution regression.

#160  | Xueliang Zhang 0001 | Google Scholar   DBLP
VenuesInterspeech: 14TASLP: 6ICASSP: 6
Years2022: 62021: 42020: 52019: 22018: 32017: 32016: 3
ISCA Sectiondereverberation and echo cancellation: 2speech-enhancement: 2speech coding and privacy: 1conferencingspeech 2021 challenge: 1interspeech 2021 deep noise suppression challenge: 1noise robust and distant speech recognition: 1speech and audio quality assessment: 1phonetic event detection and segmentation: 1speech enhancement: 1novel approaches to enhancement: 1deep learning for source separation and pitch tracking: 1source separation and spatial audio: 1
IEEE Keywordspeech enhancement: 5recurrent neural nets: 4acoustic noise: 3complex spectral mapping: 3convolutional neural nets: 3bone conduction: 2attention based fusion: 2time frequency analysis: 2microphones: 2speech intelligibility: 2deep neural network (dnn): 2spectro temporal structures: 2speech separation: 2matrix decomposition: 2air conduction: 1sensor fusion: 1signal denoising: 1microphone array processing: 1speech dereverberation: 1convolutional recurrent neural network: 1mathematical operators: 1computational complexity: 1function smoothing: 1loss metric mismatch: 1supervised single channel speech enhancement: 1long short term memory: 1recurrent neural networks: 1frame level snr estimation: 1feature combination: 1on device processing: 1mobile communication: 1dual microphone mobile phones: 1real time speech enhancement: 1densely connected convolutional recurrent network: 1microphone arrays: 1speech coding: 1monaural speech enhancement: 1denoising autoencoder: 1generative vocoder: 1vocoders: 1joint framework: 1steered response power: 1gcc phat: 1time frequency masking: 1audio signal processing: 1robust speaker localization: 1direction of arrival estimation: 1deep neural networks: 1nonnegative matrix factorization (nmf): 1binaural speech separation: 1array signal processing: 1beamforming: 1spectral analysis: 1computational auditory scene analysis (casa): 1room reverberation: 1source separation: 1reverberation: 1optimisation: 1deep neural network: 1speech synthesis: 1nonnegative matrix factorization: 1convolution: 1pitch determination: 1dynamic programming: 1convolutional neural network: 1
Most Publications2022: 132020: 112019: 112021: 72018: 7

Affiliations
Inner Mongolia University, College of Computer Science, Inner Mongolia Key Laboratory of Mongolian Information Processing Technology, China
Chinese Academy and Sciences, National Laboratory of Pattern Recognition, NLPR, Institute of Automation, China
URLs

TASLP2022 Heming Wang, Xueliang Zhang 0001, DeLiang Wang, 
Fusing Bone-Conduction and Air-Conduction Sensors for Complex-Domain Speech Enhancement.

ICASSP2022 Jinjiang Liu, Xueliang Zhang 0001
DRC-NET: Densely Connected Recurrent Convolutional Neural Network for Speech Dereverberation.

ICASSP2022 Heming Wang, Xueliang Zhang 0001, DeLiang Wang, 
Attention-Based Fusion for Bone-Conducted and Air-Conducted Speech Enhancement in the Complex Domain.

ICASSP2022 Yang Yang, Hui Zhang 0031, Xueliang Zhang 0001, Huaiwen Zhang, 
Alleviating the Loss-Metric Mismatch in Supervised Single-Channel Speech Enhancement.

Interspeech2022 Jiahui Pan, Shuai Nie, Hui Zhang 0031, Shulin He, Kanghao Zhang, Shan Liang, Xueliang Zhang 0001, Jianhua Tao, 
Speaker recognition-assisted robust audio deepfake detection.

Interspeech2022 Chenggang Zhang, Jinjiang Liu, Xueliang Zhang 0001
LCSM: A Lightweight Complex Spectral Mapping Framework for Stereophonic Acoustic Echo Cancellation.

TASLP2021 Hao Li 0046, DeLiang Wang, Xueliang Zhang 0001, Guanglai Gao, 
Recurrent Neural Networks and Acoustic Features for Frame-Level Signal-to-Noise Ratio Estimation.

ICASSP2021 Ke Tan 0001, Xueliang Zhang 0001, DeLiang Wang, 
Real-Time Speech Enhancement for Mobile Communication Based on Dual-Channel Complex Spectral Mapping.

Interspeech2021 Jinjiang Liu, Xueliang Zhang 0001
Inplace Gated Convolutional Recurrent Neural Network for Dual-Channel Speech Enhancement.

Interspeech2021 Kanghao Zhang, Shulin He, Hao Li 0046, Xueliang Zhang 0001
DBNet: A Dual-Branch Network Architecture Processing on Spectrum and Waveform for Single-Channel Speech Enhancement.

TASLP2020 Zhihao Du, Xueliang Zhang 0001, Jiqing Han, 
A Joint Framework of Denoising Autoencoder and Generative Vocoder for Monaural Speech Enhancement.

Interspeech2020 Zhihao Du, Jiqing Han, Xueliang Zhang 0001
Double Adversarial Network Based Monaural Speech Enhancement for Robust Speech Recognition.

Interspeech2020 Hao Li 0046, DeLiang Wang, Xueliang Zhang 0001, Guanglai Gao, 
Frame-Level Signal-to-Noise Ratio Estimation Using Deep Learning.

Interspeech2020 Tianjiao Xu, Hui Zhang 0031, Xueliang Zhang 0001
Polishing the Classical Likelihood Ratio Test by Supervised Learning for Voice Activity Detection.

Interspeech2020 Chenggang Zhang, Xueliang Zhang 0001
A Robust and Cascaded Acoustic Echo Cancellation Based on Deep Learning.

TASLP2019 Zhong-Qiu Wang, Xueliang Zhang 0001, DeLiang Wang, 
Robust Speaker Localization Guided by Deep Learning-Based Time-Frequency Masking.

Interspeech2019 Yun Liu, Hui Zhang 0031, Xueliang Zhang 0001, Yuhang Cao, 
Investigation of Cost Function for Supervised Monaural Speech Separation.

TASLP2018 Shuai Nie, Shan Liang, Wenju Liu, Xueliang Zhang 0001, Jianhua Tao, 
Deep Learning Based Speech Separation via NMF-Style Reconstructions.

Interspeech2018 Yun Liu, Hui Zhang 0031, Xueliang Zhang 0001
Using Shifted Real Spectrum Mask as Training Target for Supervised Speech Separation.

Interspeech2018 Zhong-Qiu Wang, Xueliang Zhang 0001, DeLiang Wang, 
Robust TDOA Estimation Based on Time-Frequency Masking and Deep Neural Networks.

#161  | Jean-François Bonastre | Google Scholar   DBLP
VenuesInterspeech: 21ICASSP: 4TASLP: 1
Years2022: 42021: 32020: 52019: 22018: 32017: 32016: 6
ISCA Sectionvoice privacy challenge: 3speaker and language recognition applications: 2speaker recognition: 2embedding and network architecture for speaker recognition: 1self supervision and anti-spoofing: 1phonation and voicing: 1speaker, language, and privacy: 1speaker diarization: 1speech signal representation: 1learning techniques for speaker recognition: 1voice conversion for style, accent, and emotion: 1speaker verification: 1spoken corpora and annotation: 1speaker recognition evaluation: 1special event: 1special session: 1speaker diarization and recognition: 1
IEEE Keywordspeaker recognition: 4speaker verification: 2i vector: 2personalized acoustic models: 1acoustic model: 1speaker information: 1hidden markov models: 1automatic speech recognition: 1collaborative learning: 1federated learning: 1at tack models: 1privacy: 1acoustic models: 1speech recognition: 1siamese networks: 1similarity metric: 1voice casting: 1information retrieval: 1speech synthesis: 1least mean squares methods: 1additive noise: 1joint modeling: 1short utterance: 1estimation theory: 1forensic science: 1bayes methods: 1speaker profile: 1forensic voice comparison: 1inter speaker variability: 1reliability: 1
Most Publications2016: 162020: 152006: 142022: 122012: 12

Affiliations

ICASSP2022 Salima Mdhaffar, Jean-François Bonastre, Marc Tommasi, Natalia A. Tomashenko, Yannick Estève, 
Retrieving Speaker Information from Personalized Acoustic Models for Speech Recognition.

ICASSP2022 Natalia A. Tomashenko, Salima Mdhaffar, Marc Tommasi, Yannick Estève, Jean-François Bonastre
Privacy Attacks for Automatic Speech Recognition Acoustic Models in A Federated Learning Framework.

Interspeech2022 Pierre-Michel Bousquet, Mickael Rouvier, Jean-François Bonastre
Reliability criterion based on learning-phase entropy for speaker recognition with neural network.

Interspeech2022 Mohammad MohammadAmini, Driss Matrouf, Jean-François Bonastre, Sandipana Dowerah, Romain Serizel, Denis Jouvet, 
Barlow Twins self-supervised learning for robust speaker recognition.

Interspeech2021 Anaïs Chanclu, Imen Ben Amor, Cédric Gendrot, Emmanuel Ferragne, Jean-François Bonastre
Automatic Classification of Phonation Types in Spontaneous Speech: Towards a New Workflow for the Characterization of Speakers' Voice Quality.

Interspeech2021 Paul-Gauthier Noé, Mohammad MohammadAmini, Driss Matrouf, Titouan Parcollet, Andreas Nautsch, Jean-François Bonastre
Adversarial Disentanglement of Speaker Representation for Attribute-Driven Privacy Preservation.

Interspeech2021 Benjamin O'Brien, Natalia A. Tomashenko, Anaïs Chanclu, Jean-François Bonastre
Anonymous Speaker Clusters: Making Distinctions Between Anonymised Speech Recordings with Clustering Interface.

Interspeech2020 Adrien Gresse, Mathias Quillot, Richard Dufour, Jean-François Bonastre
Learning Voice Representation Using Knowledge Distillation for Automatic Voice Casting.

Interspeech2020 Ana Montalvo, José R. Calvo, Jean-François Bonastre
Multi-Task Learning for Voice Related Recognition Tasks.

Interspeech2020 Andreas Nautsch, Jose Patino 0001, Natalia A. Tomashenko, Junichi Yamagishi, Paul-Gauthier Noé, Jean-François Bonastre, Massimiliano Todisco, Nicholas W. D. Evans, 
The Privacy ZEBRA: Zero Evidence Biometric Recognition Assessment.

Interspeech2020 Paul-Gauthier Noé, Jean-François Bonastre, Driss Matrouf, Natalia A. Tomashenko, Andreas Nautsch, Nicholas W. D. Evans, 
Speech Pseudonymisation Assessment Using Voice Similarity Matrices.

Interspeech2020 Natalia A. Tomashenko, Brij Mohan Lal Srivastava, Xin Wang 0037, Emmanuel Vincent 0001, Andreas Nautsch, Junichi Yamagishi, Nicholas W. D. Evans, Jose Patino 0001, Jean-François Bonastre, Paul-Gauthier Noé, Massimiliano Todisco, 
Introducing the VoicePrivacy Initiative.

ICASSP2019 Adrien Gresse, Mathias Quillot, Richard Dufour, Vincent Labatut, Jean-François Bonastre
Similarity Metric Based on Siamese Neural Networks for Voice Casting.

Interspeech2019 Itshak Lapidot, Jean-François Bonastre
Effects of Waveform PMF on Anti-Spoofing Detection.

TASLP2018 Waad Ben Kheder, Driss Matrouf, Moez Ajili, Jean-François Bonastre
A Unified Joint Model to Deal With Nuisance Variabilities in the i-Vector Space.

Interspeech2018 Moez Ajili, Jean-François Bonastre, Solange Rossato, 
Voice Comparison and Rhythm: Behavioral Differences between Target and Non-target Comparisons.

Interspeech2018 Itshak Lapidot, Héctor Delgado, Massimiliano Todisco, Nicholas W. D. Evans, Jean-François Bonastre
Speech Database and Protocol Validation Using Waveform Entropy.

Interspeech2017 Moez Ajili, Jean-François Bonastre, Waad Ben Kheder, Solange Rossato, Juliette Kahn, 
Homogeneity Measure Impact on Target and Non-Target Trials in Forensic Voice Comparison.

Interspeech2017 Adrien Gresse, Mickael Rouvier, Richard Dufour, Vincent Labatut, Jean-François Bonastre
Acoustic Pairing of Original and Dubbed Voices in the Context of Video Game Localization.

Interspeech2017 Kong-Aik Lee, Ville Hautamäki, Tomi Kinnunen, Anthony Larcher, Chunlei Zhang, Andreas Nautsch, Themos Stafylakis, Gang Liu 0001, Mickaël Rouvier, Wei Rao, Federico Alegre, J. Ma, Man-Wai Mak, Achintya Kumar Sarkar, Héctor Delgado, Rahim Saeidi, Hagai Aronowitz, Aleksandr Sizov, Hanwu Sun, Trung Hieu Nguyen 0001, G. Wang, Bin Ma 0001, Ville Vestman, Md. Sahidullah, M. Halonen, Anssi Kanervisto, Gaël Le Lan, Fahimeh Bahmaninezhad, Sergey Isadskiy, Christian Rathgeb, Christoph Busch 0001, Georgios Tzimiropoulos, Q. Qian, Z. Wang, Q. Zhao, T. Wang, H. Li, J. Xue, S. Zhu, R. Jin, T. Zhao, Pierre-Michel Bousquet, Moez Ajili, Waad Ben Kheder, Driss Matrouf, Zhi Hao Lim, Chenglin Xu, Haihua Xu, Xiong Xiao, Eng Siong Chng, Benoit G. B. Fauve, Kaavya Sriskandaraja, Vidhyasaharan Sethu, W. W. Lin, Dennis Alexander Lehmann Thomsen, Zheng-Hua Tan, Massimiliano Todisco, Nicholas W. D. Evans, Haizhou Li 0001, John H. L. Hansen, Jean-François Bonastre, Eliathamby Ambikairajah, 
The I4U Mega Fusion and Collaboration for NIST Speaker Recognition Evaluation 2016.

#162  | Tao Qin | Google Scholar   DBLP
VenuesInterspeech: 6ICASSP: 5NeurIPS: 4TASLP: 3ACL: 2ICLR: 2KDD: 2AAAI: 1ICML: 1
Years2022: 52021: 122020: 52019: 4
ISCA Sectionspeech synthesis: 4voice conversion and adaptation: 1multi- and cross-lingual asr, other topics in asr: 1
IEEE Keywordspeech synthesis: 4text to speech: 3language translation: 2neural machine translation: 2speech intelligibility: 2speech recognition: 2natural language processing: 2dropout: 1sub networks: 1multiple teachers: 1knowledge distillation: 1random processes: 1natural languages: 1mos prediction: 1mean bias network: 1sensitivity analysis: 1video signal processing: 1correlation methods: 1speech quality assessment: 1medical image processing: 1lightweight: 1fast: 1search problems: 1autoregressive processes: 1neural architecture search: 1mixup: 1low resource: 1data augmentation: 1untranscribed data: 1adaptation: 1signal reconstruction: 1noisy speech: 1denoise: 1speech enhancement: 1frame level condition: 1speaker recognition: 1signal denoising: 1neural architecture search (nas): 1neural net architecture: 1error propagation: 1language characteristic: 1sequence generation: 1accuracy drop: 1text analysis: 1
Most Publications2021: 862022: 762019: 702020: 542018: 41

Affiliations
URLs

TASLP2022 Xiaobo Liang, Lijun Wu, Juntao Li, Tao Qin, Min Zhang 0005, Tie-Yan Liu, 
Multi-Teacher Distillation With Single Model for Neural Machine Translation.

Interspeech2022 Yihan Wu, Xu Tan 0003, Bohan Li 0003, Lei He 0005, Sheng Zhao, Ruihua Song, Tao Qin, Tie-Yan Liu, 
AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios.

Interspeech2022 Guangyan Zhang, Kaitao Song, Xu Tan 0003, Daxin Tan, Yuzi Yan, Yanqing Liu, Gang Wang, Wei Zhou, Tao Qin, Tan Lee, Sheng Zhao, 
Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech.

NeurIPS2022 Yichong Leng, Zehua Chen, Junliang Guo, Haohe Liu, Jiawei Chen, Xu Tan, Danilo P. Mandic, Lei He, Xiangyang Li, Tao Qin, Sheng Zhao, Tie-Yan Liu, 
BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis.

ACL2022 Yi Ren 0006, Xu Tan 0003, Tao Qin, Zhou Zhao, Tie-Yan Liu, 
Revisiting Over-Smoothness in Text to Speech.

ICASSP2021 Yichong Leng, Xu Tan 0003, Sheng Zhao, Frank K. Soong, Xiang-Yang Li 0001, Tao Qin
MBNET: MOS Prediction for Synthesized Speech with Mean-Bias Network.

ICASSP2021 Renqian Luo, Xu Tan 0003, Rui Wang, Tao Qin, Jinzhu Li, Sheng Zhao, Enhong Chen, Tie-Yan Liu, 
Lightspeech: Lightweight and Fast Text to Speech with Neural Architecture Search.

ICASSP2021 Linghui Meng 0001, Jin Xu 0010, Xu Tan 0003, Jindong Wang 0001, Tao Qin, Bo Xu 0002, 
MixSpeech: Data Augmentation for Low-Resource Automatic Speech Recognition.

ICASSP2021 Yuzi Yan, Xu Tan 0003, Bohan Li 0003, Tao Qin, Sheng Zhao, Yuan Shen 0001, Tie-Yan Liu, 
Adaspeech 2: Adaptive Text to Speech with Untranscribed Data.

ICASSP2021 Chen Zhang 0020, Yi Ren 0006, Xu Tan 0003, Jinglin Liu, Kejun Zhang, Tao Qin, Sheng Zhao, Tie-Yan Liu, 
Denoispeech: Denoising Text to Speech with Frame-Level Noise Modeling.

Interspeech2021 Wenxin Hou, Jindong Wang 0001, Xu Tan 0003, Tao Qin, Takahiro Shinozaki, 
Cross-Domain Speech Recognition with Unsupervised Character-Level Distribution Matching.

Interspeech2021 Yuzi Yan, Xu Tan 0003, Bohan Li 0003, Guangyan Zhang, Tao Qin, Sheng Zhao, Yuan Shen 0001, Wei-Qiang Zhang, Tie-Yan Liu, 
Adaptive Text to Speech for Spontaneous Style.

NeurIPS2021 Jiawei Chen 0008, Xu Tan 0003, Yichong Leng, Jin Xu 0010, Guihua Wen, Tao Qin, Tie-Yan Liu, 
Speech-T: Transducer for Text to Speech and Beyond.

NeurIPS2021 Yichong Leng, Xu Tan 0003, Linchen Zhu, Jin Xu 0010, Renqian Luo, Linquan Liu, Tao Qin, Xiangyang Li 0001, Edward Lin, Tie-Yan Liu, 
FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition.

ICLR2021 Yi Ren 0006, Chenxu Hu, Xu Tan 0003, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu, 
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.

ICLR2021 Mingjian Chen, Xu Tan 0003, Bohan Li 0003, Yanqing Liu, Tao Qin, Sheng Zhao, Tie-Yan Liu, 
AdaSpeech: Adaptive Text to Speech for Custom Voice.

AAAI2021 Chen Zhang 0020, Xu Tan 0003, Yi Ren 0006, Tao Qin, Kejun Zhang, Tie-Yan Liu, 
UWSpeech: Speech to Speech Translation for Unwritten Languages.

TASLP2020 Yang Fan, Fei Tian, Yingce Xia, Tao Qin, Xiang-Yang Li 0001, Tie-Yan Liu, 
Searching Better Architectures for Neural Machine Translation.

Interspeech2020 Mingjian Chen, Xu Tan 0003, Yi Ren 0006, Jin Xu 0010, Hao Sun, Sheng Zhao, Tao Qin
MultiSpeech: Multi-Speaker Text to Speech with Transformer.

KDD2020 Yi Ren 0006, Xu Tan 0003, Tao Qin, Jian Luan 0001, Zhou Zhao, Tie-Yan Liu, 
DeepSinger: Singing Voice Synthesis with Data Mined From the Web.

#163  | Emmanuel Dupoux | Google Scholar   DBLP
VenuesInterspeech: 17ICASSP: 6ACL: 2TASLP: 1
Years2022: 32021: 32020: 72019: 12018: 62017: 42016: 2
ISCA Sectionzero, low-resource and multi-modal speech recognition: 2special session: 2low-resource speech recognition: 1speech synthesis: 1speech and audio quality assessment: 1the zero resource speech challenge 2020: 1diarization: 1acoustic phonetics and prosody: 1the zero resource speech challenge 2019: 1zero-resource speech recognition: 1topics in speech recognition: 1sequence models for asr: 1speech perception: 1articulatory and acoustic phonetics: 1automatic learning of representations: 1
IEEE Keywordnatural language processing: 4speech recognition: 4text analysis: 3image retrieval: 2unsupervised learning: 2channel bank filters: 2human computer interaction: 1image representation: 1automatic speech recognition: 1speech synthesis: 1zero and low resource asr.: 1dataset: 1audio signal processing: 1unsupervised and semi supervised learning: 1distant supervision: 1computational linguistics: 1speech coding: 1unsupervised pretraining: 1low resources: 1cross lingual: 1low resource asr: 1bayes methods: 1acoustic unit discovery: 1bayesian model: 1informative prior: 1multi modal data: 1unwritten languages: 1unsupervised unit discovery: 1machine translation: 1linguistics: 1time domain analysis: 1approximation theory: 1transient response: 1siamese network: 1abx: 1scattering transform: 1abnet: 1
Most Publications2022: 282020: 282021: 172018: 152016: 15

Affiliations
URLs

Interspeech2022 Robin Algayres, Adel Nabli, Benoît Sagot, Emmanuel Dupoux
Speech Sequence Embeddings using Nearest Neighbors Contrastive Learning.

Interspeech2022 Maureen de Seyssel, Marvin Lavechin, Yossi Adi, Emmanuel Dupoux, Guillaume Wisniewski, 
Probing phoneme, language and speaker information in unsupervised speech representations.

ACL2022 Eugene Kharitonov, Ann Lee 0001, Adam Polyak, Yossi Adi, Jade Copet, Kushal Lakhotia, Tu Anh Nguyen, Morgane Rivière, Abdelrahman Mohamed, Emmanuel Dupoux, Wei-Ning Hsu, 
Text-Free Prosody-Aware Generative Spoken Language Modeling.

Interspeech2021 Ewan Dunbar, Mathieu Bernard, Nicolas Hamilakis, Tu Anh Nguyen, Maureen de Seyssel, Patricia Rozé, Morgane Rivière, Eugene Kharitonov, Emmanuel Dupoux
The Zero Resource Speech Challenge 2021: Spoken Language Modelling.

Interspeech2021 Adam Polyak, Yossi Adi, Jade Copet, Eugene Kharitonov, Kushal Lakhotia, Wei-Ning Hsu, Abdelrahman Mohamed, Emmanuel Dupoux
Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.

ACL2021 Changhan Wang, Morgane Rivière, Ann Lee 0001, Anne Wu, Chaitanya Talnikar, Daniel Haziza, Mary Williamson, Juan Miguel Pino, Emmanuel Dupoux
VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation.

TASLP2020 Odette Scharenborg, Lucas Ondel, Shruti Palaskar, Philip Arthur, Francesco Ciannella, Mingxing Du, Elin Larsen, Danny Merkx, Rachid Riad, Liming Wang, Emmanuel Dupoux, Laurent Besacier, Alan W. Black, Mark Hasegawa-Johnson, Florian Metze, Graham Neubig, Sebastian Stüker, Pierre Godard, Markus Müller 0001, 
Speech Technology for Unwritten Languages.

ICASSP2020 Jacob Kahn, Morgane Rivière, Weiyi Zheng, Evgeny Kharitonov, Qiantong Xu, Pierre-Emmanuel Mazaré, Julien Karadayi, Vitaliy Liptchinsky, Ronan Collobert, Christian Fuegen, Tatiana Likhomanenko, Gabriel Synnaeve, Armand Joulin, Abdelrahman Mohamed, Emmanuel Dupoux
Libri-Light: A Benchmark for ASR with Limited or No Supervision.

ICASSP2020 Morgane Rivière, Armand Joulin, Pierre-Emmanuel Mazaré, Emmanuel Dupoux
Unsupervised Pretraining Transfers Well Across Languages.

Interspeech2020 Robin Algayres, Mohamed Salah Zaïem, Benoît Sagot, Emmanuel Dupoux
Evaluating the Reliability of Acoustic Speech Embeddings.

Interspeech2020 Ewan Dunbar, Julien Karadayi, Mathieu Bernard, Xuan-Nga Cao, Robin Algayres, Lucas Ondel, Laurent Besacier, Sakriani Sakti, Emmanuel Dupoux
The Zero Resource Speech Challenge 2020: Discovering Discrete Subword and Word Units.

Interspeech2020 Marvin Lavechin, Ruben Bousbib, Hervé Bredin, Emmanuel Dupoux, Alejandrina Cristià, 
An Open-Source Voice Type Classifier for Child-Centered Daylong Recordings.

Interspeech2020 Rachid Riad, Hadrien Titeux, Laurie Lemoine, Justine Montillot, Jennifer Hamet Bagnou, Xuan-Nga Cao, Emmanuel Dupoux, Anne-Catherine Bachoud-Lévi, 
Vocal Markers from Sustained Phonation in Huntington's Disease.

Interspeech2019 Ewan Dunbar, Robin Algayres, Julien Karadayi, Mathieu Bernard, Juan Benjumea, Xuan-Nga Cao, Lucie Miskic, Charlotte Dugrain, Lucas Ondel, Alan W. Black, Laurent Besacier, Sakriani Sakti, Emmanuel Dupoux
The Zero Resource Speech Challenge 2019: TTS Without T.

ICASSP2018 Lucas Ondel, Pierre Godard, Laurent Besacier, Elin Larsen, Mark Hasegawa-Johnson, Odette Scharenborg, Emmanuel Dupoux, Lukás Burget, François Yvon, Sanjeev Khudanpur, 
Bayesian Models for Unit Discovery on a Very Low Resource Language.

ICASSP2018 Odette Scharenborg, Laurent Besacier, Alan W. Black, Mark Hasegawa-Johnson, Florian Metze, Graham Neubig, Sebastian Stüker, Pierre Godard, Markus Müller 0001, Lucas Ondel, Shruti Palaskar, Philip Arthur, Francesco Ciannella, Mingxing Du, Elin Larsen, Danny Merkx, Rachid Riad, Liming Wang, Emmanuel Dupoux
Linguistic Unit Discovery from Multi-Modal Inputs in Unwritten Languages: Summary of the "Speaking Rosetta" JSALT 2017 Workshop.

ICASSP2018 Neil Zeghidour, Nicolas Usunier, Iasonas Kokkinos, Thomas Schatz, Gabriel Synnaeve, Emmanuel Dupoux
Learning Filterbanks from Raw Speech for Phone Recognition.

Interspeech2018 Nils Holzenberger, Mingxing Du, Julien Karadayi, Rachid Riad, Emmanuel Dupoux
Learning Word Embeddings: Unsupervised Methods for Fixed-size Representations of Variable-length Speech Segments.

Interspeech2018 Rachid Riad, Corentin Dancette, Julien Karadayi, Neil Zeghidour, Thomas Schatz, Emmanuel Dupoux
Sampling Strategies in Siamese Networks for Unsupervised Speech Representation Learning.

Interspeech2018 Neil Zeghidour, Nicolas Usunier, Gabriel Synnaeve, Ronan Collobert, Emmanuel Dupoux
End-to-End Speech Recognition from the Raw Waveform.

#164  | Jian Wu 0027 | Google Scholar   DBLP
VenuesInterspeech: 17ICASSP: 9
Years2022: 42021: 72020: 132019: 2
ISCA Sectionsource separation: 2speaker and language recognition: 1other topics in speech recognition: 1robust asr, and far-field/multi-talker asr: 1search/decoding techniques and confidence measures for asr: 1tools, corpora and resources: 1deep noise suppression challenge: 1streaming asr: 1feature extraction and distant asr: 1asr neural network architectures and training: 1singing voice computing and processing in music: 1speaker diarization: 1multi-channel speech enhancement: 1the interspeech 2020 far field speaker verification challenge: 1speech and audio source separation and scene analysis: 1asr for noisy and far-field speech: 1
IEEE Keywordspeech recognition: 6speaker recognition: 5recurrent neural nets: 4continuous speech separation: 3source separation: 2audio signal processing: 2speaker diarization: 2speech separation: 2overlapped speech: 2meeting transcription: 1recurrent selective attention network: 1conformer: 1multi speaker asr: 1transformer: 1acoustic modeling: 1natural language processing: 1multi dialect: 1mixture of experts: 1attention: 1filtering theory: 1system fusion: 1libricss: 1microphones: 1automatic speech recognition: 1permutation invariant training: 1keyword spotting: 1text analysis: 1text to speech: 1speech synthesis: 1adaptation: 1rnn t: 1deep speaker embedding: 1matrix algebra: 1graph theory: 1graph neural networks: 1pattern clustering: 1audio visual systems: 1audio visual speech recognition: 1multi modal: 1lstm: 1convolutional neural nets: 1attentive pooling: 1speaker verification: 1cnn: 1
Most Publications2020: 252021: 202022: 192019: 82008: 4

Affiliations
Microsoft Corporation, USA
Northwestern Polytechnical University, Xi'an, China
URLs

ICASSP2022 Yixuan Zhang, Zhuo Chen 0006, Jian Wu 0027, Takuya Yoshioka, Peidong Wang, Zhong Meng, Jinyu Li 0001, 
Continuous Speech Separation with Recurrent Selective Attention Network.

Interspeech2022 Sanyuan Chen, Yu Wu 0012, Chengyi Wang 0002, Shujie Liu 0001, Zhuo Chen 0006, Peidong Wang, Gang Liu, Jinyu Li 0001, Jian Wu 0027, Xiangzhan Yu, Furu Wei, 
Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?

Interspeech2022 Naoyuki Kanda, Jian Wu 0027, Yu Wu 0012, Xiong Xiao, Zhong Meng, Xiaofei Wang 0009, Yashesh Gaur, Zhuo Chen 0006, Jinyu Li 0001, Takuya Yoshioka, 
Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings.

Interspeech2022 Naoyuki Kanda, Jian Wu 0027, Yu Wu 0012, Xiong Xiao, Zhong Meng, Xiaofei Wang 0009, Yashesh Gaur, Zhuo Chen 0006, Jinyu Li 0001, Takuya Yoshioka, 
Streaming Multi-Talker ASR with Token-Level Serialized Output Training.

ICASSP2021 Sanyuan Chen, Yu Wu 0012, Zhuo Chen 0006, Jian Wu 0027, Jinyu Li 0001, Takuya Yoshioka, Chengyi Wang 0002, Shujie Liu 0001, Ming Zhou 0001, 
Continuous Speech Separation with Conformer.

ICASSP2021 Amit Das, Kshitiz Kumar, Jian Wu 0027
Multi-Dialect Speech Recognition in English Using Attention on Ensemble of Experts.

ICASSP2021 Xiong Xiao, Naoyuki Kanda, Zhuo Chen 0006, Tianyan Zhou, Takuya Yoshioka, Sanyuan Chen, Yong Zhao 0008, Gang Liu, Yu Wu 0012, Jian Wu 0027, Shujie Liu 0001, Jinyu Li 0001, Yifan Gong 0001, 
Microsoft Speaker Diarization System for the Voxceleb Speaker Recognition Challenge 2020.

Interspeech2021 Amber Afshan, Kshitiz Kumar, Jian Wu 0027
Sequence-Level Confidence Classifier for ASR Utterance Accuracy and Application to Acoustic Models.

Interspeech2021 Sanyuan Chen, Yu Wu 0012, Zhuo Chen 0006, Jian Wu 0027, Takuya Yoshioka, Shujie Liu 0001, Jinyu Li 0001, Xiangzhan Yu, 
Ultra Fast Speech Separation Model with Teacher Student Learning.

Interspeech2021 Yihui Fu, Luyao Cheng, Shubo Lv, Yukai Jv, Yuxiang Kong, Zhuo Chen 0006, Yanxin Hu, Lei Xie 0001, Jian Wu 0027, Hui Bu, Xin Xu, Jun Du, Jingdong Chen, 
AISHELL-4: An Open Source Dataset for Speech Enhancement, Separation, Recognition and Speaker Diarization in Conference Scenario.

Interspeech2021 Jian Wu 0027, Zhuo Chen 0006, Sanyuan Chen, Yu Wu 0012, Takuya Yoshioka, Naoyuki Kanda, Shujie Liu 0001, Jinyu Li 0001, 
Investigation of Practical Aspects of Single Channel Speech Separation for ASR.

ICASSP2020 Zhuo Chen 0006, Takuya Yoshioka, Liang Lu 0001, Tianyan Zhou, Zhong Meng, Yi Luo 0004, Jian Wu 0027, Xiong Xiao, Jinyu Li 0001, 
Continuous Speech Separation: Dataset and Analysis.

ICASSP2020 Eva Sharma, Guoli Ye, Wenning Wei, Rui Zhao 0017, Yao Tian, Jian Wu 0027, Lei He 0005, Ed Lin, Yifan Gong 0001, 
Adaptation of RNN Transducer with Text-To-Speech Technology for Keyword Spotting.

ICASSP2020 Jixuan Wang, Xiong Xiao, Jian Wu 0027, Ranjani Ramamurthy, Frank Rudzicz, Michael Brudno, 
Speaker Diarization with Session-Level Speaker Embedding Refinement Using Graph Neural Networks.

ICASSP2020 Jianwei Yu, Shi-Xiong Zhang, Jian Wu 0027, Shahram Ghorbani, Bo Wu, Shiyin Kang, Shansong Liu, Xunying Liu, Helen Meng, Dong Yu 0001, 
Audio-Visual Recognition of Overlapped Speech for the LRS2 Dataset.

ICASSP2020 Yong Zhao 0008, Tianyan Zhou, Zhuo Chen 0006, Jian Wu 0027
Improving Deep CNN Networks with Long Temporal Context for Text-Independent Speaker Verification.

Interspeech2020 Yanxin Hu, Yun Liu, Shubo Lv, Mengtao Xing, Shimin Zhang, Yihui Fu, Jian Wu 0027, Bihong Zhang, Lei Xie 0001, 
DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement.

Interspeech2020 Kshitiz Kumar, Chaojun Liu, Yifan Gong 0001, Jian Wu 0027
1-D Row-Convolution LSTM: Fast Streaming ASR at Accuracy Parity with LC-BLSTM.

Interspeech2020 Kshitiz Kumar, Bo Ren, Yifan Gong 0001, Jian Wu 0027
Bandpass Noise Generation and Augmentation for Unified ASR.

Interspeech2020 Kshitiz Kumar, Emilian Stoimenov, Hosam Khalil, Jian Wu 0027
Fast and Slow Acoustic Model.

#165  | Jianzong Wang | Google Scholar   DBLP
VenuesInterspeech: 15ICASSP: 11
Years2022: 112021: 92020: 6
ISCA Sectionspeech synthesis: 3speech emotion recognition: 1source separation: 1voice conversion and adaptation: 1acoustic event detection and classification: 1speech signal analysis and representation: 1graph and end-to-end learning for speaker recognition: 1embedding and network architecture for speaker recognition: 1acoustic event detection and acoustic scene classification: 1spoken language understanding: 1dnn architectures for speaker recognition: 1topics in asr: 1phonetic event detection and segmentation: 1
IEEE Keywordspeech synthesis: 6natural language processing: 5speaker recognition: 4speech recognition: 3voice conversion: 2zero shot: 2text analysis: 2transformer: 2regression analysis: 1pattern classification: 1variance regularization: 1speaker age estimation: 1attribute inference: 1label distribution learning: 1vector quantization: 1contrastive learning: 1any to any: 1low resource: 1self supervised: 1object detection: 1query processing: 1patch embedding: 1visual dialog: 1multi modal: 1computer vision: 1pattern clustering: 1question answering (information retrieval): 1interactive systems: 1self attention weight matrix: 1incomplete utterance rewriting: 1text edit: 1synthetic noise: 1adversarial perturbation: 1contextual information: 1grapheme to phoneme: 1multi speaker text to speech: 1conditional variational autoencoder: 1computational linguistics: 1continual learning: 1intent detection: 1slot filling: 1self attention: 1rnn transducer: 1recurrent neural nets: 1convolutional codes: 1waveform generation: 1waveform generators: 1location variable convolution: 1vocoder: 1convolution: 1vocoders: 1graph theory: 1speech coding: 1prosody modelling: 1graph neural network: 1text to speech: 1
Most Publications2022: 972021: 502020: 352023: 142019: 7

Affiliations
Ping An Technology (Shenzhen) Co., Ltd., China
URLs

ICASSP2022 Shijing Si, Jianzong Wang, Junqing Peng, Jing Xiao 0006, 
Towards Speaker Age Estimation With Label Distribution Learning.

ICASSP2022 Huaizhen Tang, Xulong Zhang 0001, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006, 
Avqvc: One-Shot Voice Conversion By Vector Quantization With Applying Contrastive Learning.

ICASSP2022 Qiqi Wang 0005, Xulong Zhang 0001, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006, 
DRVC: A Framework of Any-to-Any Voice Conversion with Self-Supervised Learning.

ICASSP2022 Tong Ye, Shijing Si, Jianzong Wang, Rui Wang, Ning Cheng 0001, Jing Xiao 0006, 
VU-BERT: A Unified Framework for Visual Dialog.

ICASSP2022 Yong Zhang, Zhitao Li, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006, 
Self-Attention for Incomplete Utterance Rewriting.

ICASSP2022 Chendong Zhao, Jianzong Wang, Xiaoyang Qu, Haoqian Wang, Jing Xiao 0006, 
r-G2P: Evaluating and Enhancing Robustness of Grapheme to Phoneme Conversion by Controlled Noise Introducing and Contextual Information Incorporation.

ICASSP2022 Botao Zhao, Xulong Zhang 0001, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006, 
nnSpeech: Speaker-Guided Conditional Variational Autoencoder for Zero-Shot Multi-speaker text-to-speech.

Interspeech2022 Zuheng Kang, Junqing Peng, Jianzong Wang, Jing Xiao 0006, 
SpeechEQ: Speech Emotion Recognition based on Multi-scale Unified Datasets and Multitask Learning.

Interspeech2022 Jian Luo, Jianzong Wang, Ning Cheng 0001, Edward Xiao, Xulong Zhang 0001, Jing Xiao 0006, 
Tiny-Sepformer: A Tiny Time-Domain Transformer Network For Speech Separation.

Interspeech2022 Sicheng Yang, Methawee Tantrawenith, Haolin Zhuang, Zhiyong Wu 0001, Aolan Sun, Jianzong Wang, Ning Cheng 0001, Huaizhen Tang, Xintao Zhao, Jie Wang, Helen Meng, 
Speech Representation Disentanglement with Adversarial Mutual Information Learning for One-shot Voice Conversion.

Interspeech2022 Tong Ye, Shijing Si, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006, 
Uncertainty Calibration for Deep Audio Classifiers.

ICASSP2021 Yanfei Hui, Jianzong Wang, Ning Cheng 0001, Fengying Yu, Tianbo Wu, Jing Xiao 0006, 
Joint Intent Detection and Slot Filling Based on Continual Learning Model.

ICASSP2021 Jian Luo, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006, 
Unidirectional Memory-Self-Attention Transducer for Online Speech Recognition.

ICASSP2021 Zhen Zeng, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006, 
LVCNet: Efficient Condition-Dependent Modeling Network for Waveform Generation.

Interspeech2021 Zhenhou Hong, Jianzong Wang, Xiaoyang Qu, Jie Liu, Chendong Zhao, Jing Xiao 0006, 
Federated Learning with Dynamic Transformer for Text to Speech.

Interspeech2021 Jian Luo, Jianzong Wang, Ning Cheng 0001, Jing Xiao 0006, 
Dropout Regularization for Self-Supervised Learning of Transformer Encoder Speech Representation.

Interspeech2021 Junyi Peng, Xiaoyang Qu, Rongzhi Gu, Jianzong Wang, Jing Xiao 0006, Lukás Burget, Jan Cernocký, 
Effective Phase Encoding for End-To-End Speaker Verification.

Interspeech2021 Junyi Peng, Xiaoyang Qu, Jianzong Wang, Rongzhi Gu, Jing Xiao 0006, Lukás Burget, Jan Cernocký, 
ICSpk: Interpretable Complex Speaker Embedding Extractor from Raw Waveform.

Interspeech2021 Shijing Si, Jianzong Wang, Xiaoyang Qu, Ning Cheng 0001, Wenqi Wei, Xinghua Zhu, Jing Xiao 0006, 
Speech2Video: Cross-Modal Distillation for Speech to Video Generation.

Interspeech2021 Shijing Si, Jianzong Wang, Huiming Sun, Jianhan Wu, Chuanyao Zhang, Xiaoyang Qu, Ning Cheng 0001, Lei Chen, Jing Xiao 0006, 
Variational Information Bottleneck for Effective Low-Resource Audio Classification.

#166  | Yu Wu 0012 | Google Scholar   DBLP
VenuesInterspeech: 14ICASSP: 7ACL: 2AAAI: 2ICML: 1
Years2022: 102021: 102020: 6
ISCA Sectionnovel models and training methods for asr: 3source separation: 2multi- and cross-lingual asr, other topics in asr: 2speaker and language recognition: 1other topics in speech recognition: 1robust asr, and far-field/multi-talker asr: 1neural network training methods for asr: 1asr model training and strategies: 1streaming asr: 1asr neural network architectures: 1
IEEE Keywordspeaker recognition: 4speech recognition: 4transformer: 3representation learning: 2speech enhancement: 2natural language processing: 2self supervised learning: 2speech separation: 2source separation: 2recurrent neural nets: 2unsupervised learning: 1image representation: 1speaker verification: 1self supervised pretrain: 1contrastive learning: 1wav2vec 2.0: 1robust speech recognition: 1supervised learning: 1automatic speech recognition: 1signal representation: 1multi channel microphone: 1deep learning (artificial intelligence): 1transducer: 1decoding: 1encoding: 1real time decoding: 1conformer: 1multi speaker asr: 1continuous speech separation: 1filtering theory: 1audio signal processing: 1speaker diarization: 1system fusion: 1
Most Publications2022: 262021: 262020: 182018: 132019: 11

Affiliations
Microsoft Research Asia, Beijing, China
Beihang University, State Key Lab of Software Development Environment, Beijing, China
URLs

ICASSP2022 Zhengyang Chen, Sanyuan Chen, Yu Wu 0012, Yao Qian, Chengyi Wang 0002, Shujie Liu 0001, Yanmin Qian, Michael Zeng 0001, 
Large-Scale Self-Supervised Speech Representation Learning for Automatic Speaker Verification.

ICASSP2022 Yiming Wang, Jinyu Li 0001, Heming Wang, Yao Qian, Chengyi Wang 0002, Yu Wu 0012
Wav2vec-Switch: Contrastive Learning from Original-Noisy Speech Pairs for Robust Speech Recognition.

ICASSP2022 Chengyi Wang 0002, Yu Wu 0012, Sanyuan Chen, Shujie Liu 0001, Jinyu Li 0001, Yao Qian, Zhenglu Yang, 
Improving Self-Supervised Learning for Speech Recognition with Intermediate Layer Supervision.

Interspeech2022 Sanyuan Chen, Yu Wu 0012, Chengyi Wang 0002, Shujie Liu 0001, Zhuo Chen 0006, Peidong Wang, Gang Liu, Jinyu Li 0001, Jian Wu 0027, Xiangzhan Yu, Furu Wei, 
Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?

Interspeech2022 Naoyuki Kanda, Jian Wu 0027, Yu Wu 0012, Xiong Xiao, Zhong Meng, Xiaofei Wang 0009, Yashesh Gaur, Zhuo Chen 0006, Jinyu Li 0001, Takuya Yoshioka, 
Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings.

Interspeech2022 Naoyuki Kanda, Jian Wu 0027, Yu Wu 0012, Xiong Xiao, Zhong Meng, Xiaofei Wang 0009, Yashesh Gaur, Zhuo Chen 0006, Jinyu Li 0001, Takuya Yoshioka, 
Streaming Multi-Talker ASR with Token-Level Serialized Output Training.

Interspeech2022 Zhong Meng, Yashesh Gaur, Naoyuki Kanda, Jinyu Li 0001, Xie Chen 0001, Yu Wu 0012, Yifan Gong 0001, 
Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition.

Interspeech2022 Shuo Ren, Shujie Liu 0001, Yu Wu 0012, Long Zhou, Furu Wei, 
Speech Pre-training with Acoustic Piece.

Interspeech2022 Chengyi Wang 0002, Yiming Wang, Yu Wu 0012, Sanyuan Chen, Jinyu Li 0001, Shujie Liu 0001, Furu Wei, 
Supervision-Guided Codebooks for Masked Prediction in Speech Pre-training.

ACL2022 Junyi Ao, Rui Wang, Long Zhou, Chengyi Wang 0002, Shuo Ren, Yu Wu 0012, Shujie Liu 0001, Tom Ko, Qing Li, Yu Zhang 0006, Zhihua Wei, Yao Qian, Jinyu Li 0001, Furu Wei, 
SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing.

ICASSP2021 Sanyuan Chen, Yu Wu 0012, Zhuo Chen 0006, Takuya Yoshioka, Shujie Liu 0001, Jin-Yu Li 0001, Xiangzhan Yu, 
Don't Shoot Butterfly with Rifles: Multi-Channel Continuous Speech Separation with Early Exit Transformer.

ICASSP2021 Xie Chen 0001, Yu Wu 0012, Zhenghao Wang, Shujie Liu 0001, Jinyu Li 0001, 
Developing Real-Time Streaming Transformer Transducer for Speech Recognition on Large-Scale Dataset.

ICASSP2021 Sanyuan Chen, Yu Wu 0012, Zhuo Chen 0006, Jian Wu 0027, Jinyu Li 0001, Takuya Yoshioka, Chengyi Wang 0002, Shujie Liu 0001, Ming Zhou 0001, 
Continuous Speech Separation with Conformer.

ICASSP2021 Xiong Xiao, Naoyuki Kanda, Zhuo Chen 0006, Tianyan Zhou, Takuya Yoshioka, Sanyuan Chen, Yong Zhao 0008, Gang Liu, Yu Wu 0012, Jian Wu 0027, Shujie Liu 0001, Jinyu Li 0001, Yifan Gong 0001, 
Microsoft Speaker Diarization System for the Voxceleb Speaker Recognition Challenge 2020.

Interspeech2021 Sanyuan Chen, Yu Wu 0012, Zhuo Chen 0006, Jian Wu 0027, Takuya Yoshioka, Shujie Liu 0001, Jinyu Li 0001, Xiangzhan Yu, 
Ultra Fast Speech Separation Model with Teacher Student Learning.

Interspeech2021 Naoyuki Kanda, Guoli Ye, Yu Wu 0012, Yashesh Gaur, Xiaofei Wang 0009, Zhong Meng, Zhuo Chen 0006, Takuya Yoshioka, 
Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone.

Interspeech2021 Zhong Meng, Yu Wu 0012, Naoyuki Kanda, Liang Lu 0001, Xie Chen 0001, Guoli Ye, Eric Sun, Jinyu Li 0001, Yifan Gong 0001, 
Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition.

Interspeech2021 Eric Sun, Jinyu Li 0001, Zhong Meng, Yu Wu 0012, Jian Xue, Shujie Liu 0001, Yifan Gong 0001, 
Improving Multilingual Transformer Transducer Models by Reducing Language Confusions.

Interspeech2021 Jian Wu 0027, Zhuo Chen 0006, Sanyuan Chen, Yu Wu 0012, Takuya Yoshioka, Naoyuki Kanda, Shujie Liu 0001, Jinyu Li 0001, 
Investigation of Practical Aspects of Single Channel Speech Separation for ASR.

ICML2021 Chengyi Wang 0002, Yu Wu 0012, Yao Qian, Ken'ichi Kumatani, Shujie Liu 0001, Furu Wei, Michael Zeng 0001, Xuedong Huang 0001, 
UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data.

#167  | Alex Waibel | Google Scholar   DBLP
VenuesInterspeech: 18ICASSP: 5NAACL: 1SpeechComm: 1
Years2022: 22021: 22020: 32019: 22018: 72017: 52016: 4
ISCA Sectionspecial session: 2voice conversion and adaptation: 1asr: 1streaming for asr/rnn transducers: 1cross/multi-lingual and code-switched asr: 1streaming asr: 1asr neural network architectures: 1end-to-end speech recognition: 1adjusting to speaker, accent, and domain: 1extracting information from audio: 1selected topics in neural speech processing: 1acoustic modelling: 1asr systems and technologies: 1speech translation: 1search, computational strategies and language modeling: 1low resource speech recognition: 1speech translation and metadata for linguistic/discourse structure: 1
IEEE Keywordspeech recognition: 5multilingual: 2natural language processing: 2ctc: 2signal sampling: 1self attention: 1speed perturbation: 1sequence to sequence: 1data augmentation: 1frequency domain analysis: 1sub sequence: 1low resource: 1recurrent neural nets: 1automatic speech recognition: 1connectionist temporal classification: 1language feature vectors: 1optimisation: 1feature combination: 1decoding: 1hidden markov models: 1entropy: 1posterior probability: 1pattern classification: 1articulatory features: 1dblstms: 1language documentation: 1document handling: 1phoneme segmentation: 1pattern clustering: 1feedforward neural nets: 1lstms: 1rnns: 1acoustic modeling: 1decision trees: 1signal classification: 1
Most Publications2018: 332017: 282016: 282013: 272019: 24

Affiliations
Karlsruhe Institute of Technology, Department of Informatics
Carnegie Mellon University, Computer Science Department

Interspeech2022 Tuan-Nam Nguyen, Ngoc-Quan Pham, Alexander Waibel
Accent Conversion using Pre-trained Model and Synthesized Data from Voice Conversion.

Interspeech2022 Ngoc-Quan Pham, Alexander Waibel, Jan Niehues, 
Adaptive multilingual speech recognition with pretrained models.

Interspeech2021 Thai-Son Nguyen, Sebastian Stüker, Alex Waibel
Super-Human Performance in Online Low-Latency Recognition of Conversational Speech.

Interspeech2021 Ngoc-Quan Pham, Tuan-Nam Nguyen, Sebastian Stüker, Alex Waibel
Efficient Weight Factorization for Multilingual Speech Recognition.

ICASSP2020 Thai-Son Nguyen, Sebastian Stüker, Jan Niehues, Alex Waibel
Improving Sequence-To-Sequence Speech Recognition Training with On-The-Fly Data Augmentation.

Interspeech2020 Thai-Son Nguyen, Ngoc-Quan Pham, Sebastian Stüker, Alex Waibel
High Performance Sequence-to-Sequence Model for Streaming Speech Recognition.

Interspeech2020 Ngoc-Quan Pham, Thanh-Le Ha, Tuan-Nam Nguyen, Thai-Son Nguyen, Elizabeth Salesky, Sebastian Stüker, Jan Niehues, Alex Waibel
Relative Positional Encoding for Speech Recognition and Direct Translation.

Interspeech2019 Ngoc-Quan Pham, Thai-Son Nguyen, Jan Niehues, Markus Müller 0001, Alex Waibel
Very Deep Self-Attention Networks for End-to-End Speech Recognition.

NAACL2019 Elizabeth Salesky, Matthias Sperber, Alexander Waibel
Fluent Translations from Disfluent Speech in End-to-End Speech Translation.

ICASSP2018 Markus Müller 0001, Sebastian Stüker, Alex Waibel
Multilingual Adaptation of RNN Based ASR Systems.

ICASSP2018 Thai-Son Nguyen, Sebastian Stiiker, Alex Waibel
Exploring Ctc-Network Derived Features with Conventional Hybrid System.

Interspeech2018 Markus Müller 0001, Sebastian Stüker, Alex Waibel
Neural Language Codes for Multilingual Acoustic Models.

Interspeech2018 Maren Kucza, Jan Niehues, Thomas Zenkel, Alex Waibel, Sebastian Stüker, 
Term Extraction via Neural Sequence Labeling a Comparative Evaluation of Strategies Using Recurrent Neural Networks.

Interspeech2018 Jan Niehues, Ngoc-Quan Pham, Thanh-Le Ha, Matthias Sperber, Alex Waibel
Low-Latency Neural Speech Translation.

Interspeech2018 Matthias Sperber, Jan Niehues, Graham Neubig, Sebastian Stüker, Alex Waibel
Self-Attentional Acoustic Models.

Interspeech2018 Thomas Zenkel, Ramon Sanabria, Florian Metze, Alex Waibel
Subword and Crossword Units for CTC Acoustic Models.

SpeechComm2017 Matthias Sperber, Graham Neubig, Jan Niehues, Satoshi Nakamura 0001, Alex Waibel
Transcribing against time.

ICASSP2017 Markus Müller 0001, Jörg Franke, Alex Waibel, Sebastian Stüker, 
Towards phoneme inventory discovery for documentation of unwritten languages.

Interspeech2017 Eunah Cho, Jan Niehues, Alex Waibel
NMT-Based Segmentation and Punctuation Insertion for Real-Time Spoken Language Translation.

Interspeech2017 Robin Ruede, Markus Müller 0001, Sebastian Stüker, Alex Waibel
Enhancing Backchannel Prediction Using Word Embeddings.

#168  | Shinnosuke Takamichi | Google Scholar   DBLP
VenuesInterspeech: 15ICASSP: 6TASLP: 3SpeechComm: 1
Years2022: 62021: 42020: 52019: 22018: 32017: 32016: 2
ISCA Sectionspeech synthesis: 7speech coding and restoration: 1the voicemos challenge: 1spoken language processing: 1speech annotation and speech assessment: 1speech synthesis paradigms and methods: 1speech in the brain: 1voice conversion: 1special session: 1
IEEE Keywordspeech synthesis: 7text to speech synthesis: 3speaker embedding: 2speaker recognition: 2gaussian processes: 2voice conversion: 2modulation spectrum: 2speech recognition: 2statistical parametric speech synthesis: 2generative adversarial networks: 2perceptual speaker similarity: 1multi speaker generative modeling: 1deep speaker representation learning: 1active learning: 1domain adaptation: 1mutual information: 1text analysis: 1cross lingual: 1minimum phase filter: 1sub band processing: 1deep neural network: 1spectral differentials: 1hilbert transforms: 1filtering theory: 1inter utterance pitch variation: 1music: 1dnn based singing voice synthesis: 1moment matching network: 1artificial double tracking: 1over smoothing: 1deep neural networks: 1vae based non parallel vc: 1d vectors: 1many to many vc: 1phonetic posteri orgrams: 1mean square error methods: 1vocoder free spss: 1multi resolution: 1fourier transform spectra: 1stft spectra: 1fourier transforms: 1vocoders: 1signal resolution: 1channel bank filters: 1generative adversarial training: 1dnn based speech synthesis: 1anti spoofing verification: 1multitask learning: 1training algorithm: 1formal verification: 1regression analysis: 1pattern classification: 1post filter: 1trees (mathematics): 1smoothing methods: 1gmm based voice conversion: 1clustergen: 1hidden markov models: 1mixture models: 1global variance: 1oversmoothing: 1
Most Publications2022: 272020: 202021: 182019: 142018: 13

Affiliations
URLs

Interspeech2022 Wataru Nakata, Tomoki Koriyama, Shinnosuke Takamichi, Yuki Saito, Yusuke Ijima, Ryo Masumura, Hiroshi Saruwatari, 
Predicting VQVAE-based Character Acting Style from Quotation-Annotated Text for Audiobook Speech Synthesis.

Interspeech2022 Yuto Nishimura, Yuki Saito, Shinnosuke Takamichi, Kentaro Tachibana, Hiroshi Saruwatari, 
Acoustic Modeling for End-to-End Empathetic Dialogue Speech Synthesis Using Linguistic and Prosodic Contexts of Dialogue History.

Interspeech2022 Takaaki Saeki, Shinnosuke Takamichi, Tomohiko Nakamura, Naoko Tanji, Hiroshi Saruwatari, 
SelfRemaster: Self-Supervised Speech Restoration with Analysis-by-Synthesis Approach Using Channel Modeling.

Interspeech2022 Takaaki Saeki, Detai Xin, Wataru Nakata, Tomoki Koriyama, Shinnosuke Takamichi, Hiroshi Saruwatari, 
UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022.

Interspeech2022 Yuki Saito, Yuto Nishimura, Shinnosuke Takamichi, Kentaro Tachibana, Hiroshi Saruwatari, 
STUDIES: Corpus of Japanese Empathetic Dialogue Speech Towards Friendly Voice Agent.

Interspeech2022 Shinnosuke Takamichi, Wataru Nakata, Naoko Tanji, Hiroshi Saruwatari, 
J-MAC: Japanese multi-speaker audiobook corpus for speech synthesis.

SpeechComm2021 Masashi Aso, Shinnosuke Takamichi, Norihiro Takamune, Hiroshi Saruwatari, 
Acoustic model-based subword tokenization and prosodic-context extraction without language knowledge for text-to-speech synthesis.

TASLP2021 Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari, 
Perceptual-Similarity-Aware Deep Speaker Representation Learning for Multi-Speaker Generative Modeling.

ICASSP2021 Detai Xin, Tatsuya Komatsu, Shinnosuke Takamichi, Hiroshi Saruwatari, 
Disentangled Speaker and Language Representations Using Mutual Information Minimization and Domain Adaptation for Cross-Lingual TTS.

Interspeech2021 Detai Xin, Yuki Saito, Shinnosuke Takamichi, Tomoki Koriyama, Hiroshi Saruwatari, 
Cross-Lingual Speaker Adaptation Using Domain Adaptation and Speaker Consistency Loss for Text-To-Speech Synthesis.

ICASSP2020 Takaaki Saeki, Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari, 
Lifter Training and Sub-Band Modeling for Computationally Efficient and High-Quality Voice Conversion Using Spectral Differentials.

Interspeech2020 Masashi Aso, Shinnosuke Takamichi, Hiroshi Saruwatari, 
End-to-End Text-to-Speech Synthesis with Unaligned Multiple Language Units Based on Attention.

Interspeech2020 Takaaki Saeki, Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari, 
Real-Time, Full-Band, Online DNN-Based Voice Conversion System Using a Single CPU.

Interspeech2020 Detai Xin, Yuki Saito, Shinnosuke Takamichi, Tomoki Koriyama, Hiroshi Saruwatari, 
Cross-Lingual Text-To-Speech Synthesis via Domain Adaptation and Perceptual Similarity Regression in Speaker Space.

Interspeech2020 Yuki Yamashita, Tomoki Koriyama, Yuki Saito, Shinnosuke Takamichi, Yusuke Ijima, Ryo Masumura, Hiroshi Saruwatari, 
Investigating Effective Additional Contextual Factors in DNN-Based Spontaneous Speech Synthesis.

ICASSP2019 Hiroki Tamaru, Yuki Saito, Shinnosuke Takamichi, Tomoki Koriyama, Hiroshi Saruwatari, 
Generative Moment Matching Network-based Random Modulation Post-filter for DNN-based Singing Voice Synthesis and Neural Double-tracking.

Interspeech2019 Ivan Halim Parmonangan, Hiroki Tanaka, Sakriani Sakti, Shinnosuke Takamichi, Satoshi Nakamura 0001, 
Speech Quality Evaluation of Synthesized Japanese Speech Using EEG.

TASLP2018 Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari, 
Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks.

ICASSP2018 Yuki Saito, Yusuke Ijima, Kyosuke Nishida, Shinnosuke Takamichi
Non-Parallel Voice Conversion Using Variational Autoencoders Conditioned by Phonetic Posteriorgrams and D-Vectors.

ICASSP2018 Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari, 
Text-to-Speech Synthesis Using STFT Spectra Based on Low-/Multi-Resolution Generative Adversarial Networks.

#169  | Jiqing Han | Google Scholar   DBLP
VenuesInterspeech: 16ICASSP: 7TASLP: 2
Years2022: 32021: 32020: 92019: 62018: 32017: 1
ISCA Sectionsingle-channel speech enhancement: 2audio signal characterization: 2acoustic scenes and rare events: 2multimodal speech emotion recognition and paralinguistics: 1robust speaker recognition: 1emotion and sentiment analysis: 1neural network training methods for asr: 1noise robust and distant speech recognition: 1acoustic scene classification: 1emotion modeling and analysis: 1speech and audio source separation and scene analysis: 1speech enhancement: 1speaker characterization and recognition: 1
IEEE Keywordprobability: 3speaker recognition: 3speaker verification: 3speech recognition: 3speech intelligibility: 2convolutional neural nets: 2speech enhancement: 2monaural speech enhancement: 2semi supervised sound event detection: 1signal classification: 1sparsemax: 1sparse self attention: 1audio signal processing: 1signal detection: 1pairwise distance distributions: 1open set domain adaptation: 1speech coding: 1recurrent neural nets: 1denoising autoencoder: 1generative vocoder: 1vocoders: 1joint framework: 1emotion recognition: 1cross corpus: 1natural language processing: 1transfer subspace learning: 1speech emotion recognition: 1text analysis: 1matrix decomposition: 1non negative matrix factorization: 1i vector framework: 1gaussian processes: 1end to end: 1task driven multilevel framework: 1iterative methods: 1phonetic posteriorgram: 1phoneme aware network: 1structured sparse: 1end toend: 1backpropagation: 1attention: 1automatic speech recognition: 1generative adversarial training: 1time frequency analysis: 1speech separation: 1cocktail party problem: 1gated convolutional neural network: 1permutation invariant training: 1source separation: 1pattern classification: 1plda: 1discriminative training: 1dnn: 1
Most Publications2019: 182020: 172011: 122021: 82012: 8

Affiliations
URLs

ICASSP2022 Yadong Guan, Jiabin Xue, Guibin Zheng, Jiqing Han
Sparse Self-Attention for Semi-Supervised Sound Event Detection.

ICASSP2022 Jianchen Li, Jiqing Han, Hongwei Song, 
CDMA: Cross-Domain Distance Metric Adaptation for Speaker Verification.

Interspeech2022 Fan Qian, Hongwei Song, Jiqing Han
Word-wise Sparse Attention for Multimodal Sentiment Analysis.

Interspeech2021 Jianchen Li, Jiqing Han, Hongwei Song, 
Gradient Regularization for Noise-Robust Speaker Verification.

Interspeech2021 Fan Qian, Jiqing Han
Multimodal Sentiment Analysis with Temporal Modality Attention.

Interspeech2021 Jiabin Xue, Tieran Zheng, Jiqing Han
Model-Agnostic Fast Adaptive Multi-Objective Balancing Algorithm for Multilingual Automatic Speech Recognition Model Training.

TASLP2020 Zhihao Du, Xueliang Zhang 0001, Jiqing Han
A Joint Framework of Denoising Autoencoder and Generative Vocoder for Monaural Speech Enhancement.

TASLP2020 Hui Luo, Jiqing Han
Nonnegative Matrix Factorization Based Transfer Subspace Learning for Cross-Corpus Speech Emotion Recognition.

ICASSP2020 Chen Chen, Jiqing Han
TDMF: Task-Driven Multilevel Framework for End-to-End Speaker Verification.

ICASSP2020 Zhihao Du, Ming Lei, Jiqing Han, Shiliang Zhang, 
Pan: Phoneme-Aware Network for Monaural Speech Enhancement.

ICASSP2020 Jiabin Xue, Tieran Zheng, Jiqing Han
Structured Sparse Attention for end-to-end Automatic Speech Recognition.

Interspeech2020 Zhihao Du, Jiqing Han, Xueliang Zhang 0001, 
Double Adversarial Network Based Monaural Speech Enhancement for Robust Speech Recognition.

Interspeech2020 Zhihao Du, Ming Lei, Jiqing Han, Shiliang Zhang, 
Self-Supervised Adversarial Multi-Task Learning for Vocoder-Based Monaural Speech Enhancement.

Interspeech2020 Ziqiang Shi, Rujie Liu, Jiqing Han
Speech Separation Based on Multi-Stage Elaborated Dual-Path Deep BiLSTM with Auxiliary Identity Loss.

Interspeech2020 Liwen Zhang, Jiqing Han, Ziqiang Shi, 
ATReSN-Net: Capturing Attentive Temporal Relations in Semantic Neighborhood for Acoustic Scene Classification.

ICASSP2019 Ziqiang Shi, Huibin Lin, Liu Liu, Rujie Liu, Shoji Hayakawa, Jiqing Han
Furcax: End-to-end Monaural Speech Separation Based on Deep Gated (De)convolutional Neural Networks with Adversarial Example Training.

Interspeech2019 Hui Luo, Jiqing Han
Cross-Corpus Speech Emotion Recognition Using Semi-Supervised Transfer Non-Negative Matrix Factorization with Adaptation Regularization.

Interspeech2019 Qiuying Shi, Hui Luo, Jiqing Han
Subspace Pooling Based Temporal Features Extraction for Audio Event Recognition.

Interspeech2019 Ziqiang Shi, Huibin Lin, Liu Liu, Rujie Liu, Shoji Hayakawa, Shouji Harada, Jiqing Han
End-to-End Monaural Speech Separation with Multi-Scale Dynamic Weighted Gated Dilated Convolutional Pyramid Network.

Interspeech2019 Ziqiang Shi, Huibin Lin, Liu Liu, Rujie Liu, Jiqing Han, Anyan Shi, 
Deep Attention Gated Dilated Temporal Convolutional Networks with Intra-Parallel Convolutional Modules for End-to-End Monaural Speech Separation.

#170  | Kartik Audhkhasi | Google Scholar   DBLP
VenuesInterspeech: 16ICASSP: 9
Years2022: 12021: 32020: 42019: 82018: 22017: 42016: 3
ISCA Sectionmodel training for asr: 2novel models and training methods for asr: 1streaming for asr/rnn transducers: 1low-resource speech recognition: 1neural network training methods for asr: 1multilingual and code-switched asr: 1spoken language understanding: 1asr neural network architectures and training: 1asr neural network training: 1resources – annotation – evaluation: 1rich transcription and asr systems: 1sequence-to-sequence speech recognition: 1neural network acoustic models for asr: 1conversational telephone speech recognition: 1low resource speech recognition: 1
IEEE Keywordspeech recognition: 7natural language processing: 4automatic speech recognition: 4text analysis: 2end to end systems: 2direct acoustics to word models: 2keyword search: 2speech to intent: 1synthetic speech augmentation: 1pre trained text embedding: 1spoken language understanding: 1end to end asr: 1noise injection: 1triplet contrastive loss: 1connectionist temporal classification: 1acoustic word embeddings: 1hidden markov models: 1decoding: 1end to end models: 1multi task learning: 1acoustic modeling: 1multi accent speech recognition: 1─ end to end models: 1speaker recognition: 1query processing: 1recurrent neural nets: 1audio coding: 1feedforward neural nets: 1attention networks: 1end to end speech recognition: 1ctc: 1representation learning: 1programming language semantics: 1language modeling: 1word embeddings: 1error analysis: 1regression analysis: 1one vs one multi class classification: 1random fourier features: 1large scale kernel machines: 1deep neural networks: 1
Most Publications2019: 132017: 112020: 102013: 92018: 6

Affiliations
University of Southern California, Los Angeles, USA

Interspeech2022 Kartik Audhkhasi, Yinghui Huang, Bhuvana Ramabhadran, Pedro J. Moreno 0001, 
Analysis of Self-Attention Head Diversity for Conformer-based Automatic Speech Recognition.

Interspeech2021 Kartik Audhkhasi, Tongzhou Chen, Bhuvana Ramabhadran, Pedro J. Moreno 0001, 
Mixture Model Attention: Flexible Streaming and Non-Streaming Automatic Speech Recognition.

Interspeech2021 Andrew Rouditchenko, Angie W. Boggust, David Harwath, Brian Chen, Dhiraj Joshi, Samuel Thomas 0001, Kartik Audhkhasi, Hilde Kuehne, Rameswar Panda, Rogério Schmidt Feris, Brian Kingsbury, Michael Picheny, Antonio Torralba 0001, James R. Glass, 
AVLnet: Learning Audio-Visual Language Representations from Instructional Videos.

Interspeech2021 Hainan Xu, Kartik Audhkhasi, Yinghui Huang, Jesse Emond, Bhuvana Ramabhadran, 
Regularizing Word Segmentation by Creating Misspellings.

ICASSP2020 Yinghui Huang, Hong-Kwang Kuo, Samuel Thomas 0001, Zvi Kons, Kartik Audhkhasi, Brian Kingsbury, Ron Hoory, Michael Picheny, 
Leveraging Unpaired Text Data for Training End-To-End Speech-to-Intent Systems.

Interspeech2020 Samuel Thomas 0001, Kartik Audhkhasi, Brian Kingsbury, 
Transliteration Based Data Augmentation for Training Multilingual ASR Acoustic Models in Low Resource Settings.

Interspeech2020 Hong-Kwang Jeff Kuo, Zoltán Tüske, Samuel Thomas 0001, Yinghui Huang, Kartik Audhkhasi, Brian Kingsbury, Gakuto Kurata, Zvi Kons, Ron Hoory, Luis A. Lastras, 
End-to-End Spoken Language Understanding Without Full Transcripts.

Interspeech2020 Zoltán Tüske, George Saon, Kartik Audhkhasi, Brian Kingsbury, 
Single Headed Attention Based Sequence-to-Sequence Model for State-of-the-Art Results on Switchboard.

ICASSP2019 George Saon, Zoltán Tüske, Kartik Audhkhasi, Brian Kingsbury, 
Sequence Noise Injected Training for End-to-end Speech Recognition.

ICASSP2019 Shane Settle, Kartik Audhkhasi, Karen Livescu, Michael Picheny, 
Acoustically Grounded Word Embeddings for Improved Acoustics-to-word Speech Recognition.

Interspeech2019 Kartik Audhkhasi, George Saon, Zoltán Tüske, Brian Kingsbury, Michael Picheny, 
Forget a Bit to Learn Better: Soft Forgetting for CTC-Based Automatic Speech Recognition.

Interspeech2019 Gakuto Kurata, Kartik Audhkhasi
Guiding CTC Posterior Spike Timings for Improved Posterior Fusion and Knowledge Distillation.

Interspeech2019 Gakuto Kurata, Kartik Audhkhasi
Multi-Task CTC Training with Auxiliary Feature Reconstruction for End-to-End Speech Recognition.

Interspeech2019 Michael Picheny, Zoltán Tüske, Brian Kingsbury, Kartik Audhkhasi, Xiaodong Cui, George Saon, 
Challenging the Boundaries of Speech Recognition: The MALACH Corpus.

Interspeech2019 Samuel Thomas 0001, Kartik Audhkhasi, Zoltán Tüske, Yinghui Huang, Michael Picheny, 
Detection and Recovery of OOVs for Improved English Broadcast News Captioning.

Interspeech2019 Zoltán Tüske, Kartik Audhkhasi, George Saon, 
Advancing Sequence-to-Sequence Based Speech Recognition.

ICASSP2018 Kartik Audhkhasi, Brian Kingsbury, Bhuvana Ramabhadran, George Saon, Michael Picheny, 
Building Competitive Direct Acoustics-to-Word Models for English Conversational Speech Recognition.

ICASSP2018 Xuesong Yang, Kartik Audhkhasi, Andrew Rosenberg, Samuel Thomas 0001, Bhuvana Ramabhadran, Mark Hasegawa-Johnson, 
Joint Modeling of Accents and Acoustics for Multi-Accent Speech Recognition.

ICASSP2017 Kartik Audhkhasi, Andrew Rosenberg, Abhinav Sethy, Bhuvana Ramabhadran, Brian Kingsbury, 
End-to-end ASR-free keyword search from speech.

ICASSP2017 Andrew Rosenberg, Kartik Audhkhasi, Abhinav Sethy, Bhuvana Ramabhadran, Michael Picheny, 
End-to-end speech recognition and keyword search on low-resource languages.

#171  | Zhongqiu Wang | Google Scholar   DBLP
VenuesICASSP: 12TASLP: 7Interspeech: 6
Years2022: 22021: 22020: 22019: 52018: 72017: 42016: 3
ISCA Sectionspatial and phase cues for source separation and speech recognition: 2speech enhancement and intelligibility: 1speaker and language recognition: 1deep enhancement: 1deep learning for source separation and pitch tracking: 1
IEEE Keywordspeaker recognition: 8deep neural networks: 8speech enhancement: 6speech recognition: 6reverberation: 5speech separation: 4ideal ratio mask: 4robust asr: 4microphone array processing: 3source separation: 3direction of arrival estimation: 3time frequency analysis: 3microphone arrays: 3array signal processing: 3deep clustering: 3time frequency masking: 3iterative methods: 3robust speaker localization: 2speech dereverberation: 2deep learning (artificial intelligence): 2complex spectral mapping: 2covariance matrices: 2microphones: 2beamforming: 2audio signal processing: 2speech intelligibility: 2phase: 2fourier transforms: 2cocktail party problem: 2speech coding: 2continuous speech separation: 1speaker diarization: 1supervised learning: 1regression analysis: 1blind deconvolution: 1rir estimation: 1filtering theory: 1continuous speaker separation: 1gaussian processes: 1masking based beamforming: 1gammatone frequency cepstral coefficient (gfcc): 1robust speaker recognition: 1x vector: 1transient response: 1phase estimation: 1chimera++ networks: 1blind source separation: 1spatial features: 1permutation invariant training: 1steered response power: 1gcc phat: 1acoustic noise: 1signal reconstruction: 1dereverberation: 1denoising: 1speaker separation: 1chimera + + networks: 1phase reconstruction: 1spatial clustering: 1speaker independent speech separation: 1chimera network: 1pattern clustering: 1speaker independent multi talker speech separation: 1transfer functions: 1chime 4: 1frequency estimation: 1relative transfer function estimation: 1wiener filters: 1eigenvalues and eigenfunctions: 1emotion recognition: 1signal classification: 1speech emotion recognition: 1pooling: 1kernel extreme learning machine: 1speech age/gender recognition: 1recurrent neural nets: 1recurrent neural networks: 1deep stacking networks: 1unsupervised speaker adaptation: 1batch normalization: 1chime 3: 1spectral mapping: 1signal denoising: 1chime 2: 1backpropagation: 1joint training: 1deep neural networks (dnn): 1unsupervised learning: 1robust automatic speech recognition: 1ideal binary mask: 1cnn: 1dnn: 1signal approximation: 1
Most Publications2022: 202021: 132018: 132023: 82020: 8

Affiliations
URLs

ICASSP2022 Zhong-Qiu Wang, DeLiang Wang, 
Localization based Sequential Grouping for Continuous Speech Separation.

Interspeech2022 Yen-Ju Lu, Xuankai Chang, Chenda Li, Wangyou Zhang, Samuele Cornell, Zhaoheng Ni, Yoshiki Masuyama, Brian Yan, Robin Scheibler, Zhong-Qiu Wang, Yu Tsao 0001, Yanmin Qian, Shinji Watanabe 0001, 
ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding.

TASLP2021 Zhong-Qiu Wang, Gordon Wichern, Jonathan Le Roux, 
Convolutive Prediction for Monaural Speech Dereverberation and Noisy-Reverberant Speaker Separation.

ICASSP2021 Zhong-Qiu Wang, DeLiang Wang, 
Count And Separate: Incorporating Speaker Counting For Continuous Speaker Separation.

TASLP2020 Hassan Taherian, Zhong-Qiu Wang, Jorge Chang, DeLiang Wang, 
Robust Speaker Recognition Based on Single-Channel and Multi-Channel Speech Enhancement.

TASLP2020 Zhong-Qiu Wang, DeLiang Wang, 
Deep Learning Based Target Cancellation for Speech Dereverberation.

TASLP2019 Zhong-Qiu Wang, DeLiang Wang, 
Combining Spectral and Spatial Features for Deep Learning Based Blind Speaker Separation.

TASLP2019 Zhong-Qiu Wang, Xueliang Zhang 0001, DeLiang Wang, 
Robust Speaker Localization Guided by Deep Learning-Based Time-Frequency Masking.

TASLP2019 Yan Zhao 0010, Zhong-Qiu Wang, DeLiang Wang, 
Two-Stage Deep Learning for Noisy-Reverberant Speech Enhancement.

ICASSP2019 Zhong-Qiu Wang, Ke Tan 0001, DeLiang Wang, 
Deep Learning Based Phase Reconstruction for Speaker Separation: A Trigonometric Perspective.

Interspeech2019 Hassan Taherian, Zhong-Qiu Wang, DeLiang Wang, 
Deep Learning Based Multi-Channel Speaker Recognition in Noisy and Reverberant Environments.

ICASSP2018 Zhong-Qiu Wang, Jonathan Le Roux, John R. Hershey, 
Multi-Channel Deep Clustering: Discriminative Spectral and Spatial Embeddings for Speaker-Independent Speech Separation.

ICASSP2018 Zhong-Qiu Wang, Jonathan Le Roux, John R. Hershey, 
Alternative Objective Functions for Deep Clustering.

ICASSP2018 Zhong-Qiu Wang, DeLiang Wang, 
Mask Weighted Stft Ratios for Relative Transfer Function Estimation and ITS Application to Robust ASR.

Interspeech2018 Zhong-Qiu Wang, Jonathan Le Roux, DeLiang Wang, John R. Hershey, 
End-to-End Speech Separation with Unfolded Iterative Phase Reconstruction.

Interspeech2018 Zhong-Qiu Wang, DeLiang Wang, 
Integrating Spectral and Spatial Features for Multi-Channel Speaker Separation.

Interspeech2018 Zhong-Qiu Wang, DeLiang Wang, 
All-Neural Multi-Channel Speech Enhancement.

Interspeech2018 Zhong-Qiu Wang, Xueliang Zhang 0001, DeLiang Wang, 
Robust TDOA Estimation Based on Time-Frequency Masking and Deep Neural Networks.

ICASSP2017 Zhong-Qiu Wang, Ivan Tashev, 
Learning utterance-level representations for speech emotion and age/gender recognition using deep neural networks.

ICASSP2017 Zhong-Qiu Wang, DeLiang Wang, 
Recurrent deep stacking networks for supervised speech separation.

#172  | Nima Mesgarani | Google Scholar   DBLP
VenuesInterspeech: 11ICASSP: 9TASLP: 3NeurIPS: 1ICML: 1
Years2021: 102020: 22019: 12018: 62017: 32016: 3
ISCA Sectionsource separation: 4deep learning for source separation and pitch tracking: 2voice conversion and adaptation: 1monaural source separation: 1perspective talk: 1acoustic model adaptation: 1feature extraction and acoustic modeling using neural networks for asr: 1
IEEE Keywordsource separation: 6signal representation: 4speech recognition: 3speech coding: 3decoding: 2lightweight: 2group communication: 2hearing: 2mimo communication: 2neural network: 2blind source separation: 2speech separation: 2real time: 2audio signal processing: 2time frequency analysis: 2single channel: 2deep clustering: 2neural net architecture: 1computational complexity: 1context codec: 1cocktail party problem: 1anechoic chambers (acoustic): 1speaker and direction inferred separation: 1dual channel speech separation: 1separation: 1wireless channels: 1recurrent neural nets: 1speech enhancement: 1binaural speech separation: 1hearing aids: 1augmented reality: 1interaural cues: 1time domain: 1linear codes: 1multi talker: 1signal reconstruction: 1attractor network: 1raw waveform: 1natural language processing: 1lip reading: 1speech synthesis: 1speech compression: 1medical signal processing: 1ecog: 1auditory neuroscience: 1public domain software: 1bioelectric potentials: 1magnetoencephalography: 1eeg: 1brain mapping: 1real time processing: 1brain computer interfaces: 1estimation theory: 1music separation: 1speaker recognition: 1approximation theory: 1music: 1pattern clustering: 1singing voice separation: 1signal classification: 1phoneme recognition: 1behavioural sciences computing: 1synaptic depression: 1
Most Publications2021: 152020: 112017: 82018: 72016: 7

Affiliations
URLs

TASLP2021 Yi Luo 0004, Cong Han, Nima Mesgarani
Group Communication With Context Codec for Lightweight Source Separation.

ICASSP2021 Chenxing Li, Jiaming Xu 0001, Nima Mesgarani, Bo Xu 0002, 
Speaker and Direction Inferred Dual-Channel Speech Separation.

ICASSP2021 Yi Luo, Zhuo Chen, Cong Han, Chenda Li, Tianyan Zhou, Nima Mesgarani
Rethinking The Separation Layers In Speech Separation Networks.

ICASSP2021 Yi Luo, Cong Han, Nima Mesgarani
Ultra-Lightweight Speech Separation Via Group Communication.

Interspeech2021 Cong Han, Yi Luo 0004, Chenda Li, Tianyan Zhou, Keisuke Kinoshita, Shinji Watanabe 0001, Marc Delcroix, Hakan Erdogan, John R. Hershey, Nima Mesgarani, Zhuo Chen 0006, 
Continuous Speech Separation Using Speaker Inventory for Long Recording.

Interspeech2021 Cong Han, Yi Luo, Nima Mesgarani
Binaural Speech Separation of Moving Speakers With Preserved Spatial Cues.

Interspeech2021 Yinghao Aaron Li, Ali Zare, Nima Mesgarani
StarGANv2-VC: A Diverse, Unsupervised, Non-Parallel Framework for Natural-Sounding Voice Conversion.

Interspeech2021 Yi Luo, Cong Han, Nima Mesgarani
Empirical Analysis of Generalized Iterative Speech Separation Networks.

Interspeech2021 Yi Luo, Nima Mesgarani
Implicit Filter-and-Sum Network for End-to-End Multi-Channel Speech Separation.

NeurIPS2021 Menoua Keshishian, Samuel Norman-Haignere, Nima Mesgarani
Understanding Adaptive, Multiscale Temporal Integration In Deep Speech Recognition Systems.

ICASSP2020 Cong Han, Yi Luo 0004, Nima Mesgarani
Real-Time Binaural Speech Separation with Preserved Spatial Cues.

Interspeech2020 Yi Luo 0004, Nima Mesgarani
Separating Varying Numbers of Sources with Auxiliary Autoencoding Loss.

TASLP2019 Yi Luo 0004, Nima Mesgarani
Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation.

TASLP2018 Yi Luo 0004, Zhuo Chen 0006, Nima Mesgarani
Speaker-Independent Speech Separation With Deep Attractor Network.

ICASSP2018 Yi Luo 0004, Nima Mesgarani
TaSNet: Time-Domain Audio Separation Network for Real-Time, Single-Channel Speech Separation.

ICASSP2018 Hassan Akbari, Himani Arora, Liangliang Cao, Nima Mesgarani
Lip2Audspec: Speech Reconstruction from Silent Lip Movements Video.

Interspeech2018 Yi Luo 0004, Nima Mesgarani
Real-time Single-channel Dereverberation and Separation with Time-domain Audio Separation Network.

Interspeech2018 Rajath Kumar, Yi Luo 0004, Nima Mesgarani
Music Source Activity Detection and Separation Using Deep Attractor Network.

Interspeech2018 Nima Mesgarani
Speech Processing in the Human Brain Meets Deep Learning.

ICASSP2017 Bahar Khalighinejad, Tasha Nagamine, Ashesh D. Mehta, Nima Mesgarani
NAPLib: An open source toolbox for real-time and offline Neural Acoustic Processing.

#173  | Takafumi Moriya | Google Scholar   DBLP
VenuesInterspeech: 17ICASSP: 7TASLP: 1
Years2022: 72021: 62020: 32019: 62018: 3
ISCA Sectionnovel models and training methods for asr: 2adjusting to speaker, accent, and domain: 2speech representation: 1multi-, cross-lingual and other topics in asr: 1single-channel speech enhancement: 1speech perception: 1streaming for asr/rnn transducers: 1source separation, dereverberation and echo cancellation: 1search/decoding techniques and confidence measures for asr: 1asr neural network architectures and training: 1speech and audio classification: 1model training for asr: 1nn architectures for asr: 1spoken term detection, confidence measure, and end-to-end speech recognition: 1selected topics in neural speech processing: 1
IEEE Keywordspeech recognition: 8recurrent neural nets: 3neural network: 3recurrent neural network transducer: 2end to end: 2natural language processing: 2probability: 2end to end automatic speech recognition: 2attention based decoder: 1noise robust speech recognition: 1speech extraction: 1speech enhancement: 1speakerbeam: 1speech separation: 1deep learning (artificial intelligence): 1input switching: 1emotion recognition: 1perceived emotion: 1listener adaptation: 1speech emotion recognition: 1synchronisation: 1whole network pre training: 1entropy: 1autoregressive processes: 1transformer: 1sequence level consistency training: 1specaugment: 1semi supervised learning: 1speech codecs: 1knowledge distillation: 1connectionist temporal classification: 1attention weight: 1covariance matrix adaptation evolution strategy (cma es): 1multi objective optimization: 1deep neural network (dnn): 1evolutionary computation: 1genetic algorithm: 1pareto optimisation: 1hidden markov models: 1cloud computing: 1parallel processing: 1speech coding: 1attention based encoder decoder: 1hierarchical recurrent encoder decoder: 1
Most Publications2022: 142019: 92023: 82021: 82018: 7

Affiliations
URLs

ICASSP2022 Takafumi Moriya, Takanori Ashihara, Atsushi Ando, Hiroshi Sato, Tomohiro Tanaka, Kohei Matsuura, Ryo Masumura, Marc Delcroix, Takahiro Shinozaki, 
Hybrid RNN-T/Attention-Based Streaming ASR with Triggered Chunkwise Attention and Dual Internal Language Model Integration.

ICASSP2022 Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Naoyuki Kamo, Takafumi Moriya
Learning to Enhance or Not: Neural Network-Based Switching of Enhanced and Observed Signals for Overlapping Speech Recognition.

Interspeech2022 Takanori Ashihara, Takafumi Moriya, Kohei Matsuura, Tomohiro Tanaka, 
Deep versus Wide: An Analysis of Student Architectures for Task-Agnostic Knowledge Distillation of Self-Supervised Speech Models.

Interspeech2022 Ryo Masumura, Yoshihiro Yamazaki, Saki Mizuno, Naoki Makishima, Mana Ihori, Mihiro Uchida, Hiroshi Sato, Tomohiro Tanaka, Akihiko Takashima, Satoshi Suzuki, Shota Orihashi, Takafumi Moriya, Nobukatsu Hojo, Atsushi Ando, 
End-to-End Joint Modeling of Conversation History-Dependent and Independent ASR Systems with Multi-History Training.

Interspeech2022 Takafumi Moriya, Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Takahiro Shinozaki, 
Streaming Target-Speaker ASR with Neural Transducer.

Interspeech2022 Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Takafumi Moriya, Naoki Makishima, Mana Ihori, Tomohiro Tanaka, Ryo Masumura, 
Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations.

Interspeech2022 Tomohiro Tanaka, Ryo Masumura, Hiroshi Sato, Mana Ihori, Kohei Matsuura, Takanori Ashihara, Takafumi Moriya
Domain Adversarial Self-Supervised Speech Representation Learning for Improving Unknown Domain Downstream Tasks.

ICASSP2021 Atsushi Ando, Ryo Masumura, Hiroshi Sato, Takafumi Moriya, Takanori Ashihara, Yusuke Ijima, Tomoki Toda, 
Speech Emotion Recognition Based on Listener Adaptive Models.

ICASSP2021 Takafumi Moriya, Takanori Ashihara, Tomohiro Tanaka, Tsubasa Ochiai, Hiroshi Sato, Atsushi Ando, Yusuke Ijima, Ryo Masumura, Yusuke Shinohara, 
Simpleflat: A Simple Whole-Network Pre-Training Approach for RNN Transducer-Based End-to-End Speech Recognition.

Interspeech2021 Takanori Ashihara, Takafumi Moriya, Makio Kashino, 
Investigating the Impact of Spectral and Temporal Degradation on End-to-End Automatic Speech Recognition Performance.

Interspeech2021 Takafumi Moriya, Tomohiro Tanaka, Takanori Ashihara, Tsubasa Ochiai, Hiroshi Sato, Atsushi Ando, Ryo Masumura, Marc Delcroix, Taichi Asami, 
Streaming End-to-End Speech Recognition for Hybrid RNN-T/Attention Architecture.

Interspeech2021 Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Takafumi Moriya, Naoyuki Kamo, 
Should We Always Separate?: Switching Between Enhanced and Observed Signals for Overlapping Speech Recognition.

Interspeech2021 Tomohiro Tanaka, Ryo Masumura, Mana Ihori, Akihiko Takashima, Takafumi Moriya, Takanori Ashihara, Shota Orihashi, Naoki Makishima, 
Cross-Modal Transformer-Based Neural Correction Models for Automatic Speech Recognition.

ICASSP2020 Ryo Masumura, Mana Ihori, Akihiko Takashima, Takafumi Moriya, Atsushi Ando, Yusuke Shinohara, 
Sequence-Level Consistency Training for Semi-Supervised End-to-End Automatic Speech Recognition.

ICASSP2020 Takafumi Moriya, Hiroshi Sato, Tomohiro Tanaka, Takanori Ashihara, Ryo Masumura, Yusuke Shinohara, 
Distilling Attention Weights for CTC-Based ASR Systems.

Interspeech2020 Takafumi Moriya, Tsubasa Ochiai, Shigeki Karita, Hiroshi Sato, Tomohiro Tanaka, Takanori Ashihara, Ryo Masumura, Yusuke Shinohara, Marc Delcroix, 
Self-Distillation for Improving CTC-Transformer-Based ASR Systems.

TASLP2019 Takafumi Moriya, Tomohiro Tanaka, Takahiro Shinozaki, Shinji Watanabe 0001, Kevin Duh, 
Evolution-Strategy-Based Automation of System Development for High-Performance Speech Recognition.

ICASSP2019 Ryo Masumura, Tomohiro Tanaka, Takafumi Moriya, Yusuke Shinohara, Takanobu Oba, Yushi Aono, 
Large Context End-to-end Automatic Speech Recognition via Extension of Hierarchical Recurrent Encoder-decoder Models.

Interspeech2019 Takanori Ashihara, Yusuke Shinohara, Hiroshi Sato, Takafumi Moriya, Kiyoaki Matsui, Takaaki Fukutomi, Yoshikazu Yamaguchi, Yushi Aono, 
Neural Whispered Speech Detection with Imbalanced Learning.

Interspeech2019 Ryo Masumura, Hiroshi Sato, Tomohiro Tanaka, Takafumi Moriya, Yusuke Ijima, Takanobu Oba, 
End-to-End Automatic Speech Recognition with a Reconstruction Criterion Using Speech-to-Text and Text-to-Speech Encoder-Decoders.

#174  | Shuai Wang 0016 | Google Scholar   DBLP
VenuesInterspeech: 11ICASSP: 10TASLP: 4
Years2022: 12021: 52020: 82019: 72018: 32017: 1
ISCA Sectionspeaker recognition: 2embedding and network architecture for speaker recognition: 1speaker recognition challenges and applications: 1learning techniques for speaker recognition: 1anti-spoofing and liveness detection: 1speaker recognition and diarization: 1speaker recognition and anti-spoofing: 1the 2019 automatic speaker verification spoofing and countermeasures challenge: 1speaker verification using neural network methods: 1short utterances speaker recognition: 1
IEEE Keywordspeaker recognition: 12speaker verification: 5data augmentation: 4hidden markov models: 3data handling: 3teacher student learning: 2convolutional neural networks: 2i vector: 2text dependent speaker verification: 2x vector: 2speaker embedding: 2speaker diarization: 2bayes methods: 2hmm: 2dihard: 2pattern clustering: 2variational bayes: 2gaussian processes: 2triplet loss: 2voice activity detection: 1speech activity detection. weakly supervised learning: 1teacher training: 1speech recognition: 1multi modal system: 1face recognition: 1biometrics (access control): 1deep learning (artificial intelligence): 1audio visual deep neural network: 1data analysis: 1person verification: 1domain adaptation: 1unsupervised learning: 1contrastive learning: 1self supervised learning: 1speech synthesis: 1unit selection synthesis: 1variational auto encoder: 1text independent speaker verification: 1generative adversarial network: 1on the fly data augmentation: 1specaugment: 1convolutional neural nets: 1channel information: 1adversarial training: 1multitask learning: 1optimisation: 1probability: 1linear discriminant analysis: 1chime: 1inference mechanisms: 1speaker neural embedding: 1angular softmax: 1center loss: 1short duration text independent speaker verification: 1knowledge distillation: 1computer aided instruction: 1recurrent neural nets: 1end to end: 1hard trial selection: 1expectation maximisation algorithm: 1speech intelligibility: 1dilated convolution: 1co channel speaker identification: 1convolution: 1focal loss: 1feedforward neural nets: 1
Most Publications2020: 112019: 112021: 102018: 92022: 3

Affiliations
Shanghai Jiao Tong University, Department of Computer Science and Engineering, China
URLs

Interspeech2022 Bei Liu, Zhengyang Chen, Shuai Wang 0016, Haoyu Wang, Bing Han, Yanmin Qian, 
DF-ResNet: Boosting Speaker Verification Performance with Depth-First Design.

TASLP2021 Heinrich Dinkel, Shuai Wang 0016, Xuenan Xu, Mengyue Wu, Kai Yu 0004, 
Voice Activity Detection in the Wild: A Data-Driven Approach Using Teacher-Student Training.

TASLP2021 Yanmin Qian, Zhengyang Chen, Shuai Wang 0016
Audio-Visual Deep Neural Network for Robust Person Verification.

ICASSP2021 Zhengyang Chen, Shuai Wang 0016, Yanmin Qian, 
Self-Supervised Learning Based Domain Adaptation for Robust Speaker Verification.

ICASSP2021 Chenpeng Du, Bing Han, Shuai Wang 0016, Yanmin Qian, Kai Yu 0004, 
SynAug: Synthesis-Based Data Augmentation for Text-Dependent Speaker Verification.

ICASSP2021 Houjun Huang, Xu Xiang, Fei Zhao, Shuai Wang 0016, Yanmin Qian, 
Unit Selection Synthesis Based Data Augmentation for Fixed Phrase Speaker Verification.

TASLP2020 Shuai Wang 0016, Yexin Yang, Zhanghao Wu, Yanmin Qian, Kai Yu 0004, 
Data Augmentation Using Deep Generative Models for Embedding Based Speaker Recognition.

ICASSP2020 Shuai Wang 0016, Johan Rohdin, Oldrich Plchot, Lukás Burget, Kai Yu 0004, Jan Cernocký, 
Investigation of Specaugment for Deep Speaker Embedding Learning.

ICASSP2020 Zhengyang Chen, Shuai Wang 0016, Yanmin Qian, Kai Yu 0004, 
Channel Invariant Speaker Embedding Learning with Joint Multi-Task and Adversarial Training.

ICASSP2020 Mireia Díez, Lukás Burget, Federico Landini, Shuai Wang 0016, Honza Cernocký, 
Optimizing Bayesian Hmm Based X-Vector Clustering for the Second Dihard Speech Diarization Challenge.

ICASSP2020 Federico Landini, Shuai Wang 0016, Mireia Díez, Lukás Burget, Pavel Matejka, Katerina Zmolíková, Ladislav Mosner, Anna Silnova, Oldrich Plchot, Ondrej Novotný, Hossein Zeinali, Johan Rohdin, 
But System for the Second Dihard Speech Diarization Challenge.

Interspeech2020 Zhengyang Chen, Shuai Wang 0016, Yanmin Qian, 
Multi-Modality Matters: A Performance Leap on VoxCeleb.

Interspeech2020 Zhengyang Chen, Shuai Wang 0016, Yanmin Qian, 
Adversarial Domain Adaptation for Speaker Verification Using Partially Shared Network.

Interspeech2020 Hongji Wang, Heinrich Dinkel, Shuai Wang 0016, Yanmin Qian, Kai Yu 0004, 
Dual-Adversarial Domain Adaptation for Generalized Replay Attack Detection.

TASLP2019 Shuai Wang 0016, Zili Huang, Yanmin Qian, Kai Yu 0004, 
Discriminative Neural Embedding Learning for Short-Duration Text-Independent Speaker Verification.

ICASSP2019 Shuai Wang 0016, Yexin Yang, Tianzhe Wang, Yanmin Qian, Kai Yu 0004, 
Knowledge Distillation for Small Foot-print Deep Speaker Embedding.

Interspeech2019 Mireia Díez, Lukás Burget, Shuai Wang 0016, Johan Rohdin, Jan Cernocký, 
Bayesian HMM Based x-Vector Clustering for Speaker Diarization.

Interspeech2019 Hongji Wang, Heinrich Dinkel, Shuai Wang 0016, Yanmin Qian, Kai Yu 0004, 
Cross-Domain Replay Spoofing Attack Detection Using Domain Adversarial Training.

Interspeech2019 Shuai Wang 0016, Johan Rohdin, Lukás Burget, Oldrich Plchot, Yanmin Qian, Kai Yu 0004, Jan Cernocký, 
On the Usage of Phonetic Information for Text-Independent Speaker Embedding Extraction.

Interspeech2019 Zhanghao Wu, Shuai Wang 0016, Yanmin Qian, Kai Yu 0004, 
Data Augmentation Using Variational Autoencoder for Embedding Based Speaker Verification.

#175  | Bin Liu 0041 | Google Scholar   DBLP
VenuesInterspeech: 16ICASSP: 6TASLP: 3
Years2022: 12021: 52020: 82019: 62018: 12017: 22016: 2
ISCA Sectionspeech emotion recognition: 3health and affect: 1multi-channel audio and emotion recognition: 1speech enhancement: 1speech in multimodality: 1speech in health: 1speech and audio source separation and scene analysis: 1emotion and personality in conversation: 1audio signal characterization: 1asr for noisy and far-field speech: 1speech and voice disorders: 1deep enhancement: 1prosody and text processing: 1speech enhancement and noise reduction: 1
IEEE Keywordspeech recognition: 4emotion recognition: 3multimodal fusion: 2recurrent neural nets: 2digital health: 1end to end: 1diseases: 1microorganisms: 1covid 19: 1robust end to end speech recognition: 1speech distortion: 1speech enhancement: 1speech transformer: 1gated recurrent fusion: 1iterative methods: 1inverse problems: 1vocal tract: 1filtering theory: 1autoregressive processes: 1speech synthesis: 1source filter model: 1arx lf model: 1glottal source: 1signal denoising: 1conversational transformer network (ctnet): 1context sensitive modeling: 1speaker sensitive modeling: 1speaker recognition: 1signal classification: 1conversational emotion recognition: 1self attention: 1cross attention: 1speech emotion recognition: 1audio visual systems: 1transformer: 1continuous emotion recognition: 1model level fusion: 1video signal processing: 1multi head attention: 1image fusion: 1keyword spotting: 1double edge triggered detecting method: 1human computer interaction: 1focal loss: 1blstm rnn: 1joint training: 1pitch estimation: 1feature mapping: 1bottleneck features: 1image sequences: 1real time magnetic resonance imaging sequences: 1biomedical mri: 1boundary intensity map: 1real time systems: 1splines (mathematics): 1medical image processing: 1tongue contour extraction: 1
Most Publications2020: 172019: 162021: 152018: 102022: 9

Affiliations
Chinese Academy of Sciences, Institute of Automation, National Laboratory of Pattern Recognition, Beijing, China
URLs

ICASSP2022 Cong Cai, Bin Liu 0041, Jianhua Tao, Zhengkun Tian, Jiahao Lu, Kexin Wang, 
End-to-End Network Based on Transformer for Automatic Detection of Covid-19.

TASLP2021 Cunhang Fan, Jiangyan Yi, Jianhua Tao, Zhengkun Tian, Bin Liu 0041, Zhengqi Wen, 
Gated Recurrent Fusion With Joint Training Framework for Robust End-to-End Speech Recognition.

TASLP2021 Yongwei Li, Jianhua Tao, Donna Erickson, Bin Liu 0041, Masato Akagi, 
$F_0$-Noise-Robust Glottal Source and Vocal Tract Analysis Based on ARX-LF Model.

TASLP2021 Zheng Lian, Bin Liu 0041, Jianhua Tao, 
CTNet: Conversational Transformer Network for Emotion Recognition.

ICASSP2021 Licai Sun, Bin Liu 0041, Jianhua Tao, Zheng Lian, 
Multimodal Cross- and Self-Attention Network for Speech Emotion Recognition.

Interspeech2021 Cong Cai, Mingyue Niu, Bin Liu 0041, Jianhua Tao, Xuefei Liu, 
TDCA-Net: Time-Domain Channel Attention Network for Depression Detection.

ICASSP2020 Jian Huang 0014, Jianhua Tao, Bin Liu 0041, Zheng Lian, Mingyue Niu, 
Multimodal Transformer Fusion for Continuous Emotion Recognition.

Interspeech2020 Cunhang Fan, Jianhua Tao, Bin Liu 0041, Jiangyan Yi, Zhengqi Wen, 
Gated Recurrent Fusion of Spatial and Spectral Features for Multi-Channel Speech Separation with Deep Embedding Representations.

Interspeech2020 Cunhang Fan, Jianhua Tao, Bin Liu 0041, Jiangyan Yi, Zhengqi Wen, 
Joint Training for Simultaneous Speech Denoising and Dereverberation with Deep Embedding Representations.

Interspeech2020 Jian Huang 0014, Jianhua Tao, Bin Liu 0041, Zheng Lian, 
Learning Utterance-Level Representations with Label Smoothing for Speech Emotion Recognition.

Interspeech2020 Yongwei Li, Jianhua Tao, Bin Liu 0041, Donna Erickson, Masato Akagi, 
Comparison of Glottal Source Parameter Values in Emotional Vowels.

Interspeech2020 Zheng Lian, Jianhua Tao, Bin Liu 0041, Jian Huang 0014, Zhanlei Yang, Rongjun Li, 
Context-Dependent Domain Adversarial Neural Network for Multimodal Emotion Recognition.

Interspeech2020 Zheng Lian, Jianhua Tao, Bin Liu 0041, Jian Huang 0014, Zhanlei Yang, Rongjun Li, 
Conversational Emotion Recognition Using Self-Attention Mechanisms and Graph Neural Networks.

Interspeech2020 Ziping Zhao 0001, Qifei Li, Nicholas Cummins, Bin Liu 0041, Haishuai Wang, Jianhua Tao, Björn W. Schuller, 
Hybrid Network Feature Extraction for Depression Assessment from Speech.

ICASSP2019 Bin Liu 0041, Shuai Nie, Yaping Zhang, Shan Liang, Zhanlei Yang, Wenju Liu, 
Loss and Double-edge-triggered Detector for Robust Small-footprint Keyword Spotting.

Interspeech2019 Cunhang Fan, Bin Liu 0041, Jianhua Tao, Jiangyan Yi, Zhengqi Wen, 
Discriminative Learning for Monaural Speech Separation Using Deep Embedding Features.

Interspeech2019 Zheng Lian, Jianhua Tao, Bin Liu 0041, Jian Huang 0014, 
Conversational Emotion Analysis via Attention Mechanisms.

Interspeech2019 Zheng Lian, Jianhua Tao, Bin Liu 0041, Jian Huang 0014, 
Unsupervised Representation Learning with Future Observation Prediction for Speech Emotion Recognition.

Interspeech2019 Bin Liu 0041, Shuai Nie, Shan Liang, Wenju Liu, Meng Yu 0003, Lianwu Chen, Shouye Peng, Changliang Li, 
Jointly Adversarial Enhancement Training for Robust End-to-End Speech Recognition.

Interspeech2019 Mingyue Niu, Jianhua Tao, Bin Liu 0041, Cunhang Fan, 
Automatic Depression Level Detection via ℓp-Norm Pooling.

#176  | Rohit Prabhavalkar | Google Scholar   DBLP
VenuesInterspeech: 14ICASSP: 11
Years2022: 62021: 42020: 32019: 42018: 42017: 22016: 2
ISCA Sectionneural network acoustic models for asr: 2asr: 1search/decoding algorithms for asr: 1speech analysis: 1novel models and training methods for asr: 1resource-constrained asr: 1novel neural network architectures for asr: 1noise robust and distant speech recognition: 1cross-lingual and multilingual asr: 1sequence-to-sequence speech recognition: 1asr neural network architectures: 1end-to-end speech recognition: 1topics in speech recognition: 1
IEEE Keywordspeech recognition: 10recurrent neural nets: 4speech coding: 3natural language processing: 2speech enhancement: 2automatic speech recognition: 2vocabulary: 2end toend speech recognition: 1named entities: 1class language model: 1shallow fusion: 1rnnt: 1two pass asr: 1long form asr: 1speaker recognition: 1end to end asr: 1conformer: 1echo suppression: 1sequence to sequence model: 1acoustic simulation: 1multi task loss: 1acoustic echo cancellation: 1calibration: 1mean square error methods: 1voice activity detection: 1attention based end to end models: 1transformer: 1confidence: 1optimisation: 1mobile handsets: 1decoding: 1channel bank filters: 1generative adversarial networks: 1probability: 1lstm: 1rnn: 1model compression: 1embedded speech recognition: 1svd: 1
Most Publications2022: 172019: 142021: 132018: 132017: 11

Affiliations
URLs

ICASSP2022 Antoine Bruguier, Duc Le, Rohit Prabhavalkar, Dangna Li, Zhe Liu 0011, Bo Wang, Eun Chang, Fuchun Peng, Ozlem Kalinli, Michael L. Seltzer, 
Neural-FST Class Language Model for End-to-End Speech Recognition.

ICASSP2022 Tara N. Sainath, Yanzhang He, Arun Narayanan, Rami Botros, Weiran Wang, David Qiu, Chung-Cheng Chiu, Rohit Prabhavalkar, Alexander Gruenstein, Anmol Gulati, Bo Li 0028, David Rybach, Emmanuel Guzman, Ian McGraw, James Qin, Krzysztof Choromanski, Qiao Liang, Robert David, Ruoming Pang, Shuo-Yiin Chang, Trevor Strohman, W. Ronny Huang, Wei Han 0002, Yonghui Wu, Yu Zhang 0033, 
Improving The Latency And Quality Of Cascaded Encoders.

Interspeech2022 Shaojin Ding, Weiran Wang, Ding Zhao, Tara N. Sainath, Yanzhang He, Robert David, Rami Botros, Xin Wang 0016, Rina Panigrahy, Qiao Liang, Dongseong Hwang, Ian McGraw, Rohit Prabhavalkar, Trevor Strohman, 
A Unified Cascaded Encoder ASR Model for Dynamic Model Sizes.

Interspeech2022 Ke Hu, Tara N. Sainath, Yanzhang He, Rohit Prabhavalkar, Trevor Strohman, Sepand Mavandadi, Weiran Wang, 
Improving Deliberation by Text-Only and Semi-Supervised Training.

Interspeech2022 W. Ronny Huang, Shuo-Yiin Chang, David Rybach, Tara N. Sainath, Rohit Prabhavalkar, Cal Peyser, Zhiyun Lu, Cyril Allauzen, 
E2E Segmenter: Joint Segmenting and Decoding for Long-Form ASR.

Interspeech2022 Weiran Wang, Tongzhou Chen, Tara N. Sainath, Ehsan Variani, Rohit Prabhavalkar, W. Ronny Huang, Bhuvana Ramabhadran, Neeraj Gaur, Sepand Mavandadi, Cal Peyser, Trevor Strohman, Yanzhang He, David Rybach, 
Improving Rare Word Recognition with LM-aware MWER Training.

ICASSP2021 Nathan Howard, Alex Park 0001, Turaj Zakizadeh Shabestary, Alexander Gruenstein, Rohit Prabhavalkar
A Neural Acoustic Echo Canceller Optimized Using An Automatic Speech Recognizer and Large Scale Synthetic Data.

ICASSP2021 David Qiu, Qiujia Li, Yanzhang He, Yu Zhang 0033, Bo Li 0028, Liangliang Cao, Rohit Prabhavalkar, Deepti Bhatia, Wei Li 0133, Ke Hu, Tara N. Sainath, Ian McGraw, 
Learning Word-Level Confidence for Subword End-To-End ASR.

Interspeech2021 Yuan Shangguan, Rohit Prabhavalkar, Hang Su, Jay Mahadeokar, Yangyang Shi, Jiatong Zhou, Chunyang Wu, Duc Le, Ozlem Kalinli, Christian Fuegen, Michael L. Seltzer, 
Dissecting User-Perceived Latency of On-Device E2E Speech Recognition.

Interspeech2021 Yangyang Shi, Varun Nagaraja, Chunyang Wu, Jay Mahadeokar, Duc Le, Rohit Prabhavalkar, Alex Xiao, Ching-Feng Yeh, Julian Chan, Christian Fuegen, Ozlem Kalinli, Michael L. Seltzer, 
Dynamic Encoder Transducer: A Flexible Solution for Trading Off Accuracy for Latency.

ICASSP2020 Ke Hu, Tara N. Sainath, Ruoming Pang, Rohit Prabhavalkar
Deliberation Model Based Two-Pass End-To-End Speech Recognition.

ICASSP2020 Tara N. Sainath, Yanzhang He, Bo Li 0028, Arun Narayanan, Ruoming Pang, Antoine Bruguier, Shuo-Yiin Chang, Wei Li 0133, Raziel Alvarez, Zhifeng Chen, Chung-Cheng Chiu, David Garcia, Alexander Gruenstein, Ke Hu, Anjuli Kannan, Qiao Liang, Ian McGraw, Cal Peyser, Rohit Prabhavalkar, Golan Pundak, David Rybach, Yuan Shangguan, Yash Sheth, Trevor Strohman, Mirkó Visontai, Yonghui Wu, Yu Zhang 0033, Ding Zhao, 
A Streaming On-Device End-To-End Model Surpassing Server-Side Conventional Model Quality and Latency.

Interspeech2020 Antoine Bruguier, Ananya Misra, Arun Narayanan, Rohit Prabhavalkar
Anti-Aliasing Regularization in Stacking Layers.

ICASSP2019 Yanzhang He, Tara N. Sainath, Rohit Prabhavalkar, Ian McGraw, Raziel Alvarez, Ding Zhao, David Rybach, Anjuli Kannan, Yonghui Wu, Ruoming Pang, Qiao Liang, Deepti Bhatia, Yuan Shangguan, Bo Li 0028, Golan Pundak, Khe Chai Sim, Tom Bagby, Shuo-Yiin Chang, Kanishka Rao, Alexander Gruenstein, 
Streaming End-to-end Speech Recognition for Mobile Devices.

Interspeech2019 Ke Hu, Antoine Bruguier, Tara N. Sainath, Rohit Prabhavalkar, Golan Pundak, 
Phoneme-Based Contextualization for Cross-Lingual Speech Recognition in End-to-End Models.

Interspeech2019 Kazuki Irie, Rohit Prabhavalkar, Anjuli Kannan, Antoine Bruguier, David Rybach, Patrick Nguyen, 
On the Choice of Modeling Unit for Sequence-to-Sequence Speech Recognition.

Interspeech2019 Tara N. Sainath, Ruoming Pang, David Rybach, Yanzhang He, Rohit Prabhavalkar, Wei Li 0133, Mirkó Visontai, Qiao Liang, Trevor Strohman, Yonghui Wu, Ian McGraw, Chung-Cheng Chiu, 
Two-Pass End-to-End Speech Recognition.

ICASSP2018 Chung-Cheng Chiu, Tara N. Sainath, Yonghui Wu, Rohit Prabhavalkar, Patrick Nguyen, Zhifeng Chen, Anjuli Kannan, Ron J. Weiss, Kanishka Rao, Ekaterina Gonina, Navdeep Jaitly, Bo Li 0028, Jan Chorowski, Michiel Bacchiani, 
State-of-the-Art Speech Recognition with Sequence-to-Sequence Models.

ICASSP2018 Chris Donahue, Bo Li 0028, Rohit Prabhavalkar
Exploring Speech Enhancement with Generative Adversarial Networks for Robust Speech Recognition.

ICASSP2018 Tara N. Sainath, Rohit Prabhavalkar, Shankar Kumar, Seungji Lee, Anjuli Kannan, David Rybach, Vlad Schogol, Patrick Nguyen, Bo Li 0028, Yonghui Wu, Zhifeng Chen, Chung-Cheng Chiu, 
No Need for a Lexicon? Evaluating the Value of the Pronunciation Lexica in End-to-End Models.

#177  | Yossi Adi | Google Scholar   DBLP
VenuesInterspeech: 13ICASSP: 7ACL: 2NAACL: 1ICML: 1NeurIPS: 1
Years2022: 92021: 32020: 62019: 12018: 12017: 42016: 1
ISCA Sectionsingle-channel speech enhancement: 2phonation and voice quality: 2acoustic signal representation and analysis: 1spoken language processing: 1zero, low-resource and multi-modal speech recognition: 1single-channel and multi-channel speech enhancement: 1speech synthesis: 1privacy and security in speech communication: 1phonetic event detection and segmentation: 1voice conversion and adaptation: 1lexical and pronunciation modeling: 1
IEEE Keywordspeaker recognition: 3speech enhancement: 2sequence segmentation: 2neural net architecture: 2structured prediction: 2recurrent neural networks (rnns): 2domain adaptation: 1unsupervised denoising: 1unsupervised learning: 1self supervised learning: 1zero shot learning: 1reverberation: 1signal classification: 1source separation: 1speaker classification: 1audio generation: 1speech synthesis: 1speech recognition: 1convergence: 1signal representation: 1natural language processing: 1phoneme boundary detection: 1adversarial learning: 1multi task learning: 1error statistics: 1automatic speech recognition: 1automatic speaker verification: 1security of data: 1adversarial examples: 1prediction theory: 1recurrent neural nets: 1voice onset time: 1word segmentation: 1
Most Publications2022: 332021: 162020: 162023: 102017: 9

Affiliations
URLs

ICASSP2022 Efthymios Tzinis, Yossi Adi, Vamsi K. Ithapu, Buye Xu, Anurag Kumar 0003, 
Continual Self-Training With Bootstrapped Remixing For Speech Enhancement.

Interspeech2022 Shahaf Bassan, Yossi Adi, Jeffrey S. Rosenschein, 
Unsupervised Symbolic Music Segmentation using Ensemble Temporal Prediction Errors.

Interspeech2022 Sravya Popuri, Peng-Jen Chen, Changhan Wang, Juan Pino 0001, Yossi Adi, Jiatao Gu, Wei-Ning Hsu, Ann Lee 0001, 
Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation.

Interspeech2022 Maureen de Seyssel, Marvin Lavechin, Yossi Adi, Emmanuel Dupoux, Guillaume Wisniewski, 
Probing phoneme, language and speaker information in unsupervised speech representations.

Interspeech2022 Or Tal, Moshe Mandel, Felix Kreuk, Yossi Adi
A Systematic Comparison of Phonetic Aware Techniques for Speech Enhancement.

Interspeech2022 Arnon Turetzky, Tzvi Michelson, Yossi Adi, Shmuel Peleg, 
Deep Audio Waveform Prior.

ACL2022 Eugene Kharitonov, Ann Lee 0001, Adam Polyak, Yossi Adi, Jade Copet, Kushal Lakhotia, Tu Anh Nguyen, Morgane Rivière, Abdelrahman Mohamed, Emmanuel Dupoux, Wei-Ning Hsu, 
Text-Free Prosody-Aware Generative Spoken Language Modeling.

ACL2022 Ann Lee 0001, Peng-Jen Chen, Changhan Wang, Jiatao Gu, Sravya Popuri, Xutai Ma, Adam Polyak, Yossi Adi, Qing He, Yun Tang 0002, Juan Pino 0001, Wei-Ning Hsu, 
Direct Speech-to-Speech Translation With Discrete Units.

NAACL2022 Ann Lee 0001, Hongyu Gong, Paul-Ambroise Duquenne, Holger Schwenk, Peng-Jen Chen, Changhan Wang, Sravya Popuri, Yossi Adi, Juan Miguel Pino, Jiatao Gu, Wei-Ning Hsu, 
Textless Speech-to-Speech Translation on Real Data.

ICASSP2021 Shlomo E. Chazan, Lior Wolf, Eliya Nachmani, Yossi Adi
Single Channel Voice Separation for Unknown Number of Speakers Under Reverberant and Noisy Settings.

ICASSP2021 Adam Polyak, Lior Wolf, Yossi Adi, Ori Kabeli, Yaniv Taigman, 
High Fidelity Speech Regeneration with Application to Speech Enhancement.

Interspeech2021 Adam Polyak, Yossi Adi, Jade Copet, Eugene Kharitonov, Kushal Lakhotia, Wei-Ning Hsu, Abdelrahman Mohamed, Emmanuel Dupoux, 
Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.

ICASSP2020 Felix Kreuk, Yaniv Sheena, Joseph Keshet, Yossi Adi
Phoneme Boundary Detection Using Learnable Segmental Features.

Interspeech2020 Alexandre Défossez, Gabriel Synnaeve, Yossi Adi
Real Time Speech Enhancement in the Waveform Domain.

Interspeech2020 Felix Kreuk, Yossi Adi, Bhiksha Raj, Rita Singh, Joseph Keshet, 
Hide and Speak: Towards Deep Neural Networks for Speech Steganography.

Interspeech2020 Felix Kreuk, Joseph Keshet, Yossi Adi
Self-Supervised Contrastive Learning for Unsupervised Phoneme Segmentation.

Interspeech2020 Adam Polyak, Lior Wolf, Yossi Adi, Yaniv Taigman, 
Unsupervised Cross-Domain Singing Voice Conversion.

ICML2020 Eliya Nachmani, Yossi Adi, Lior Wolf, 
Voice Separation with an Unknown Number of Multiple Speakers.

ICASSP2019 Yossi Adi, Neil Zeghidour, Ronan Collobert, Nicolas Usunier, Vitaliy Liptchinsky, Gabriel Synnaeve, 
To Reverse the Gradient or Not: an Empirical Comparison of Adversarial and Multi-task Learning in Speech Recognition.

ICASSP2018 Felix Kreuk, Yossi Adi, Moustapha Cissé, Joseph Keshet, 
Fooling End-To-End Speaker Verification With Adversarial Examples.

#178  | Joon Son Chung | Google Scholar   DBLP
VenuesInterspeech: 15ICASSP: 10
Years2022: 42021: 62020: 82019: 32018: 32017: 1
ISCA Sectionspeaker diarization: 3multimodal speech processing: 2speaker and language recognition: 1tools, corpora and resources: 1bi- and multilinguality: 1learning techniques for speaker recognition: 1speech enhancement: 1speaker recognition and diarization: 1deep enhancement: 1multimodal systems: 1speaker verification: 1speaker database and anti-spoofing: 1
IEEE Keywordspeaker recognition: 7speaker verification: 4graph theory: 2graph attention networks: 2speech recognition: 2anti spoofing: 1end to end: 1audio spoofing detection: 1heterogeneous: 1keyword boosting: 1contextual biasing: 1keyword score: 1beam search: 1gaussian processes: 1multi scale: 1pattern clustering: 1speaker diarisation: 1domain adaptation: 1entertainment: 1signal classification: 1graph neural network: 1graph attention network: 1optimisation: 1entropy: 1lip reading: 1cross modal distillation: 1filtering theory: 1speaker representation: 1triplet loss: 1source separation: 1cross modal learning: 1audio visual systems: 1signal representation: 1selfsupervised machine learning: 1cnns: 1speech: 1
Most Publications2020: 302022: 172021: 172019: 132018: 10

Affiliations
URLs

ICASSP2022 Jee-weon Jung, Hee-Soo Heo, Hemlata Tak, Hye-jin Shim, Joon Son Chung, Bong-Jin Lee, Ha-Jin Yu, Nicholas W. D. Evans, 
AASIST: Audio Anti-Spoofing Using Integrated Spectro-Temporal Graph Attention Networks.

ICASSP2022 Namkyu Jung, Geonmin Kim, Joon Son Chung
Spell My Name: Keyword Boosted Speech Recognition.

ICASSP2022 Youngki Kwon, Hee-Soo Heo, Jee-Weon Jung, You Jin Kim, Bong-Jin Lee, Joon Son Chung
Multi-Scale Speaker Embedding-Based Graph Attention Networks For Speaker Diarisation.

Interspeech2022 Jee-weon Jung, You Jin Kim, Hee-Soo Heo, Bong-Jin Lee, Youngki Kwon, Joon Son Chung
Pushing the limits of raw waveform speaker recognition.

ICASSP2021 Andrew Brown 0006, Jaesung Huh, Arsha Nagrani, Joon Son Chung, Andrew Zisserman, 
Playing a Part: Speaker Verification at the movies.

ICASSP2021 Jee-weon Jung, Hee-Soo Heo, Ha-Jin Yu, Joon Son Chung
Graph Attention Networks for Speaker Verification.

ICASSP2021 Yoohwan Kwon, Hee-Soo Heo, Bong-Jin Lee, Joon Son Chung
The ins and outs of speaker recognition: lessons from VoxSRC 2020.

Interspeech2021 Jee-weon Jung, Hee-Soo Heo, Youngki Kwon, Joon Son Chung, Bong-Jin Lee, 
Three-Class Overlapped Speech Detection Using a Convolutional Recurrent Neural Network.

Interspeech2021 You Jin Kim, Hee-Soo Heo, Soyeon Choe, Soo-Whan Chung, Yoohwan Kwon, Bong-Jin Lee, Youngki Kwon, Joon Son Chung
Look Who's Talking: Active Speaker Detection in the Wild.

Interspeech2021 Youngki Kwon, Jee-weon Jung, Hee-Soo Heo, You Jin Kim, Bong-Jin Lee, Joon Son Chung
Adapting Speaker Embeddings for Speaker Diarisation.

ICASSP2020 Triantafyllos Afouras, Joon Son Chung, Andrew Zisserman, 
ASR is All You Need: Cross-Modal Distillation for Lip Reading.

ICASSP2020 Seongkyu Mun, Soyeon Choe, Jaesung Huh, Joon Son Chung
The Sound of My Voice: Speaker Representation Loss for Target Voice Separation.

ICASSP2020 Arsha Nagrani, Joon Son Chung, Samuel Albanie, Andrew Zisserman, 
Disentangled Speech Embeddings Using Cross-Modal Self-Supervision.

Interspeech2020 Triantafyllos Afouras, Joon Son Chung, Andrew Zisserman, 
Now You're Speaking My Language: Visual Language Identification.

Interspeech2020 Soo-Whan Chung, Soyeon Choe, Joon Son Chung, Hong-Goo Kang, 
FaceFilter: Audio-Visual Speech Separation Using Still Images.

Interspeech2020 Joon Son Chung, Jaesung Huh, Seongkyu Mun, Minjae Lee, Hee-Soo Heo, Soyeon Choe, Chiheon Ham, Sunghwan Jung, Bong-Jin Lee, Icksang Han, 
In Defence of Metric Learning for Speaker Recognition.

Interspeech2020 Joon Son Chung, Jaesung Huh, Arsha Nagrani, Triantafyllos Afouras, Andrew Zisserman, 
Spot the Conversation: Speaker Diarisation in the Wild.

Interspeech2020 Soo-Whan Chung, Hong-Goo Kang, Joon Son Chung
Seeing Voices and Hearing Voices: Learning Discriminative Embeddings Using Cross-Modal Self-Supervision.

ICASSP2019 Weidi Xie, Arsha Nagrani, Joon Son Chung, Andrew Zisserman, 
Utterance-level Aggregation for Speaker Recognition in the Wild.

Interspeech2019 Triantafyllos Afouras, Joon Son Chung, Andrew Zisserman, 
My Lips Are Concealed: Audio-Visual Speech Enhancement Through Obstructions.

#179  | Michael L. Seltzer | Google Scholar   DBLP
VenuesInterspeech: 14ICASSP: 9TASLP: 1
Years2022: 42021: 72020: 42019: 22018: 22017: 32016: 2
ISCA Sectionnovel neural network architectures for asr: 2resource-constrained asr: 2summarization, entity extraction, evaluation and others: 1spoken language understanding: 1neural transducers, streaming asr and novel asr models: 1language and lexical modeling for asr: 1streaming for asr/rnn transducers: 1asr neural network architectures: 1lexicon and language model for speech recognition: 1neural network training strategies for asr: 1acoustic model adaptation: 1feature extraction and acoustic modeling using neural networks for asr: 1
IEEE Keywordspeech recognition: 10natural language processing: 6acoustic modeling: 3recurrent neural nets: 2end to end speech recognition: 2hybrid speech recognition: 2recurrent neural networks: 2end toend speech recognition: 1named entities: 1class language model: 1shallow fusion: 1leveraging unpaired text: 1streaming end to end speech recognition: 1language model fusion: 1rnn t: 1generative adversarial network: 1accent invariance: 1unsupervised learning: 1chenones: 1graphemic pronunciation learning: 1transformer: 1decoding: 1class based language model: 1weighted finite state transducer: 1token passing: 1language translation: 1character sets: 1multilingual: 1language universal: 1linguistics: 1spatial smoothing: 1resnet: 1vgg: 1smoothing methods: 1blstm: 1conversational speech recognition: 1lace: 1convolutional neural networks: 1augmentation: 1room impulse responses: 1deep neural network: 1reverberation: 1linear augmented network: 1pre training: 1deep network: 1
Most Publications2021: 192019: 112020: 102022: 92017: 8

Affiliations
URLs

ICASSP2022 Antoine Bruguier, Duc Le, Rohit Prabhavalkar, Dangna Li, Zhe Liu 0011, Bo Wang, Eun Chang, Fuchun Peng, Ozlem Kalinli, Michael L. Seltzer
Neural-FST Class Language Model for End-to-End Speech Recognition.

Interspeech2022 Suyoun Kim, Duc Le, Weiyi Zheng, Tarun Singh, Abhinav Arora, Xiaoyu Zhai, Christian Fuegen, Ozlem Kalinli, Michael L. Seltzer
Evaluating User Perception of Speech Recognition System Quality with Semantic Distance Metric.

Interspeech2022 Duc Le, Akshat Shrivastava, Paden D. Tomasello, Suyoun Kim, Aleksandr Livshits, Ozlem Kalinli, Michael L. Seltzer
Deliberation Model for On-Device Spoken Language Understanding.

Interspeech2022 Jay Mahadeokar, Yangyang Shi, Ke Li, Duc Le, Jiedan Zhu, Vikas Chandra, Ozlem Kalinli, Michael L. Seltzer
Streaming parallel transducer beam search with fast slow cascaded encoders.

ICASSP2021 Suyoun Kim, Yuan Shangguan, Jay Mahadeokar, Antoine Bruguier, Christian Fuegen, Michael L. Seltzer, Duc Le, 
Improved Neural Language Model Fusion for Streaming Recurrent Neural Network Transducer.

Interspeech2021 Suyoun Kim, Abhinav Arora, Duc Le, Ching-Feng Yeh, Christian Fuegen, Ozlem Kalinli, Michael L. Seltzer
Semantic Distance: A New Metric for ASR Performance Analysis Towards Spoken Language Understanding.

Interspeech2021 Duc Le, Mahaveer Jain, Gil Keren, Suyoun Kim, Yangyang Shi, Jay Mahadeokar, Julian Chan, Yuan Shangguan, Christian Fuegen, Ozlem Kalinli, Yatharth Saraf, Michael L. Seltzer
Contextualized Streaming End-to-End Speech Recognition with Trie-Based Deep Biasing and Shallow Fusion.

Interspeech2021 Jay Mahadeokar, Yangyang Shi, Yuan Shangguan, Chunyang Wu, Alex Xiao, Hang Su, Duc Le, Ozlem Kalinli, Christian Fuegen, Michael L. Seltzer
Flexi-Transducer: Optimizing Latency, Accuracy and Compute for Multi-Domain On-Device Scenarios.

Interspeech2021 Varun Nagaraja, Yangyang Shi, Ganesh Venkatesh, Ozlem Kalinli, Michael L. Seltzer, Vikas Chandra, 
Collaborative Training of Acoustic Encoders for Speech Recognition.

Interspeech2021 Yuan Shangguan, Rohit Prabhavalkar, Hang Su, Jay Mahadeokar, Yangyang Shi, Jiatong Zhou, Chunyang Wu, Duc Le, Ozlem Kalinli, Christian Fuegen, Michael L. Seltzer
Dissecting User-Perceived Latency of On-Device E2E Speech Recognition.

Interspeech2021 Yangyang Shi, Varun Nagaraja, Chunyang Wu, Jay Mahadeokar, Duc Le, Rohit Prabhavalkar, Alex Xiao, Ching-Feng Yeh, Julian Chan, Christian Fuegen, Ozlem Kalinli, Michael L. Seltzer
Dynamic Encoder Transducer: A Flexible Solution for Trading Off Accuracy for Latency.

ICASSP2020 Yi-Chen Chen, Zhaojun Yang, Ching-Feng Yeh, Mahaveer Jain, Michael L. Seltzer
Aipnet: Generative Adversarial Pre-Training of Accent-Invariant Networks for End-To-End Speech Recognition.

ICASSP2020 Duc Le, Thilo Köhler, Christian Fuegen, Michael L. Seltzer
G2G: TTS-Driven Pronunciation Learning for Graphemic Hybrid ASR.

ICASSP2020 Yongqiang Wang 0005, Abdelrahman Mohamed, Duc Le, Chunxi Liu, Alex Xiao, Jay Mahadeokar, Hongzhao Huang, Andros Tjandra, Xiaohui Zhang, Frank Zhang 0001, Christian Fuegen, Geoffrey Zweig, Michael L. Seltzer
Transformer-Based Acoustic Modeling for Hybrid Speech Recognition.

Interspeech2020 Yangyang Shi, Yongqiang Wang 0005, Chunyang Wu, Christian Fuegen, Frank Zhang 0001, Duc Le, Ching-Feng Yeh, Michael L. Seltzer
Weak-Attention Suppression for Transformer Based Speech Recognition.

ICASSP2019 Zhehuai Chen, Mahaveer Jain, Yongqiang Wang 0005, Michael L. Seltzer, Christian Fuegen, 
End-to-end Contextual Speech Recognition Using Class Language Models and a Token Passing Decoder.

Interspeech2019 Zhehuai Chen, Mahaveer Jain, Yongqiang Wang 0005, Michael L. Seltzer, Christian Fuegen, 
Joint Grapheme and Phoneme Embeddings for Contextual End-to-End ASR.

ICASSP2018 Suyoun Kim, Michael L. Seltzer
Towards Language-Universal End-to-End Speech Recognition.

Interspeech2018 Suyoun Kim, Michael L. Seltzer, Jinyu Li 0001, Rui Zhao 0017, 
Improved Training for Online End-to-end Speech Recognition Systems.

TASLP2017 Wayne Xiong, Jasha Droppo, Xuedong Huang 0001, Frank Seide, Michael L. Seltzer, Andreas Stolcke, Dong Yu 0001, Geoffrey Zweig, 
Toward Human Parity in Conversational Speech Recognition.

#180  | Man-Wai Mak | Google Scholar   DBLP
VenuesInterspeech: 9ICASSP: 8TASLP: 7
Years2022: 62021: 32020: 62019: 22018: 22017: 32016: 2
ISCA Sectionspeaker recognition: 3speaker and language recognition: 1atypical speech analysis and detection: 1feature, embedding and neural architecture for speaker recognition: 1speaker embedding: 1speaker recognition evaluation: 1speaker characterization and recognition: 1
IEEE Keywordspeaker recognition: 14speaker verification: 10domain adaptation: 7i vectors: 5maximum mean discrepancy: 4statistics pooling: 3data augmentation: 3gaussian processes: 2deep neural networks: 2speaker embedding: 2spectral analysis: 2mutual information: 2transfer learning: 2speaker verification (sv): 2probability: 2pattern clustering: 2mixture of plda: 2gumbel softmax: 1attention models: 1self attention: 1short time fourier transform: 1audio signal processing: 1hyper parameter optimization: 1robust speaker verification: 1bayes methods: 1search problems: 1filtering theory: 1parameter estimation: 1population based learning: 1self supervised learning: 1alzheimer's disease detection: 1features: 1cognition: 1adress: 1medical diagnostic computing: 1geriatrics: 1asr: 1natural language processing: 1diseases: 1signal classification: 1patient diagnosis: 1linguistics: 1speech recognition: 1signal representation: 1spectral pooling: 1domain adversarial training: 1gaussian distribution: 1variational autoencoder: 1x vectors: 1speech coding: 1statistical distributions: 1statistics: 1l vectors: 1dnn driven mixture of plda: 1spectral clustering: 1mixture models: 1noise robustness: 1probabilistic lda: 1snr invariant plda: 1speaker subspaces: 1snr subspaces: 1
Most Publications2022: 122020: 112018: 112016: 102017: 9

Affiliations
Hong Kong Polytechnic University, Hong Kong

TASLP2022 Weiwei Lin 0002, Man-Wai Mak
Mixture Representation Learning for Deep Speaker Embedding.

TASLP2022 Youzhi Tu, Man-Wai Mak
Aggregating Frame-Level Information in the Spectral Domain With Self-Attention for Speaker Embedding.

ICASSP2022 Weiwei Lin 0002, Man-Wai Mak
Robust Speaker Verification Using Population-Based Data Augmentation.

ICASSP2022 Lu Yi, Man-Wai Mak
Disentangled Speaker Embedding for Robust Speaker Verification.

Interspeech2022 Zhenke Gao, Man-Wai Mak, Weiwei Lin 0002, 
UNet-DenseNet for Robust Far-Field Speaker Verification.

Interspeech2022 Xiaoquan Ke, Man-Wai Mak, Helen M. Meng, 
Automatic Selection of Discriminative Features for Dementia Detection in Cantonese-Speaking People.

ICASSP2021 Jinchao Li, Jianwei Yu, Zi Ye, Simon Wong, Man-Wai Mak, Brian Mak, Xunying Liu, Helen Meng, 
A Comparative Study of Acoustic and Linguistic Features Classification for Alzheimer's Disease Detection.

ICASSP2021 Youzhi Tu, Man-Wai Mak
Short-Time Spectral Aggregation for Speaker Embedding.

Interspeech2021 Youzhi Tu, Man-Wai Mak
Mutual Information Enhanced Training for Speaker Embedding.

TASLP2020 Weiwei Lin 0002, Man-Wai Mak, Na Li 0012, Dan Su 0002, Dong Yu 0001, 
A Framework for Adapting DNN Speaker Embedding Across Languages.

TASLP2020 Youzhi Tu, Man-Wai Mak, Jen-Tzung Chien, 
Variational Domain Adversarial Learning With Mutual Information Maximization for Speaker Verification.

ICASSP2020 Weiwei Lin 0002, Man-Wai Mak, Na Li 0012, Dan Su 0002, Dong Yu 0001, 
Multi-Level Deep Neural Network Adaptation for Speaker Verification Using MMD and Consistency Regularization.

Interspeech2020 Wei-Wei Lin 0002, Man-Wai Mak
Wav2Spk: A Simple DNN Architecture for Learning Speaker Embeddings from Waveforms.

Interspeech2020 Weiwei Lin 0002, Man-Wai Mak, Jen-Tzung Chien, 
Strategies for End-to-End Text-Independent Speaker Verification.

Interspeech2020 Lu Yi, Man-Wai Mak
Adversarial Separation and Adaptation Network for Far-Field Speaker Verification.

ICASSP2019 Wei-Wei Lin 0002, Man-Wai Mak, Youzhi Tu, Jen-Tzung Chien, 
Semi-supervised Nuisance-attribute Networks for Domain Adaptation.

Interspeech2019 Youzhi Tu, Man-Wai Mak, Jen-Tzung Chien, 
Variational Domain Adversarial Learning for Speaker Verification.

TASLP2018 Wei-Wei Lin 0002, Man-Wai Mak, Jen-Tzung Chien, 
Multisource I-Vectors Domain Adaptation Using Maximum Mean Discrepancy Based Autoencoders.

ICASSP2018 Longxin Li, Man-Wai Mak
Unsupervised Domain Adaptation for Gender-Aware PLDA Mixture Models.

TASLP2017 Na Li, Man-Wai Mak, Jen-Tzung Chien, 
DNN-Driven Mixture of PLDA for Robust Speaker Verification.

#181  | Simon Doclo | Google Scholar   DBLP
VenuesTASLP: 13ICASSP: 10Interspeech: 1
Years2021: 12020: 22019: 22018: 52017: 52016: 9
ISCA Sectionintelligibility-enhancing speech modification: 1
IEEE Keywordreverberation: 11speech enhancement: 10wiener filters: 8maximum likelihood estimation: 6dereverberation: 6psd estimation: 6filtering theory: 5fourier transforms: 4transient response: 4microphones: 4noise reduction: 4least squares approximations: 4hearing aids: 4microphone arrays: 4speech dereverberation: 4optimisation: 3adaptive filters: 3array signal processing: 3mwf: 3evd: 3eigenvalues and eigenfunctions: 3mean square error methods: 2parameter estimation: 2spectral analysis: 2transfer functions: 2beamforming: 2computational complexity: 2signal denoising: 2sparsity: 2acoustic multi channel equalization: 2instrumental measures: 2architectural acoustics: 2speech: 2loudspeakers: 2equalisers: 2speech intelligibility: 2acoustic convolution: 2multi frame mvdr filter: 1speech interframe correlation: 1correlation methods: 1single microphone speech enhancement: 1residual echo suppression: 1minimisation: 1late residual echo estimation: 1acoustic echo cancellation: 1telephony: 1gradient methods: 1echo suppression: 1multi channel linear prediction: 1deconvolution: 1data dependent beamforming: 1retf estimation: 1speaker recognition: 1diffuse sound: 1array processing: 1power spectral density estimation: 1prewhi tening: 1ml: 1estimation theory: 1binaural cues: 1multi channel wiener filter: 1interaural coherence: 1late reverberation: 1noise: 1matrix algebra: 1least squares: 1iterative methods: 1decomposition: 1complexity reduction: 1power method: 1inverse problems: 1channel estimation: 1signal dependent penalty function: 1admm: 1voice communication: 1cramer rao bounds: 1perceived reverberation: 1experimental listening tests: 1audio signal processing: 1late reverberant psd: 1prewhitening: 1direction of arrival estimation: 1least mean squares methods: 1medical signal processing: 1prediction error methods: 1adaptive feedback control: 1music: 1pnlms: 1error analysis: 1ipnlms: 1automatic parameter selection: 1acoustic communication (telecommunication): 1telecommunication channels: 1l hypersurface: 1error statistics: 1microphone array: 1cramér–rao lower bound: 1non negative matrix factorization: 1reverberation chambers: 1spectral modeling: 1non negative convolutive transfer function: 1regularization: 1white noise gain (wng): 1head related transfer functions (hrtfs): 1filters: 1virtual artificial head: 1mathematical programming: 1common part modeling: 1min max optimization: 1minimax techniques: 1acoustic feedback cancellation: 1maximum stable gain: 1reliability: 1level measurement: 1perceptual evaluation: 1rir estimation errors: 1robustness: 1maximum likelihood estimator: 1multi microphone: 1isotropic sound field: 1feedback cancellation: 1affine combination: 1pbfdaf: 1
Most Publications2015: 312016: 282018: 242017: 242022: 22


TASLP2021 Dörte Fischer, Simon Doclo
Robust Constrained MFMVDR Filters for Single-Channel Speech Enhancement Based on Spherical Uncertainty Set.

TASLP2020 Naveen Kumar Desiraju, Simon Doclo, Markus Buck, Tobias Wolff, 
Online Estimation of Reverberation Parameters For Late Residual Echo Suppression.

Interspeech2020 Felicitas Bederna, Henning F. Schepker, Christian Rollwage, Simon Doclo, Arne Pusch, Jörg Bitzer, Jan Rennies, 
Adaptive Compressive Onset-Enhancement for Improved Speech Intelligibility in Noise and Reverberation.

TASLP2019 Thomas Dietzen, Ann Spriet, Wouter Tirry, Simon Doclo, Marc Moonen, Toon van Waterschoot, 
Comparative Analysis of Generalized Sidelobe Cancellation and Multi-Channel Linear Prediction for Speech Dereverberation and Noise Reduction.

ICASSP2019 Marvin Tammen, Simon Doclo, Ina Kodrasi, 
Joint Estimation of RETF Vector and Power Spectral Densities for Speech Enhancement Based on Alternating Least Squares.

TASLP2018 Sebastian Braun, Adam Kuklasinski, Ofer Schwartz, Oliver Thiergart, Emanuël A. P. Habets, Sharon Gannot, Simon Doclo, Jesper Jensen 0001, 
Evaluation and Comparison of Late Reverberation Power Spectral Density Estimators.

TASLP2018 Ina Kodrasi, Simon Doclo
Analysis of Eigenvalue Decomposition-Based Late Reverberation Power Spectral Density Estimation.

TASLP2018 Daniel Marquardt, Simon Doclo
Interaural Coherence Preservation for Binaural Noise Reduction Using Partial Noise Estimation and Spectral Postfiltering.

ICASSP2018 Ina Kodrasi, Simon Doclo
Joint Late Reverberation and Noise Power Spectral Density Estimation in a Spatially Homogeneous Noise Field.

ICASSP2018 Marvin Tammen, Ina Kodrasi, Simon Doclo
Complexity Reduction of Eigenvalue Decomposition-Based Diffuse Power Spectral Density Estimators Using the Power Method.

TASLP2017 Ina Kodrasi, Simon Doclo
Signal-Dependent Penalty Functions for Robust Acoustic Multi-Channel Equalization.

TASLP2017 Adam Kuklasinski, Simon Doclo, Søren Holdt Jensen, Jesper Rindom Jensen, 
Correction to "Maximum Likelihood PSD Estimation for Speech Enhancement in Reverberation and Noise".

ICASSP2017 Hamza A. Javed, Benjamin Cauchi, Simon Doclo, Patrick A. Naylor, Stefan Goetze, 
Measuring, modelling and predicting perceived reverberation.

ICASSP2017 Ina Kodrasi, Simon Doclo
Late reverberant power spectral density estimation based on an eigenvalue decomposition.

ICASSP2017 Linh Thi Thuc Tran, Henning F. Schepker, Simon Doclo, Hai Huyen Dam, Sven Nordholm, 
Proportionate NLMS for adaptive feedback control in hearing aids.

TASLP2016 Ina Kodrasi, Simon Doclo
Joint Dereverberation and Noise Reduction Based on Acoustic Multi-Channel Equalization.

TASLP2016 Adam Kuklasinski, Simon Doclo, Søren Holdt Jensen, Jesper Jensen 0001, 
Maximum Likelihood PSD Estimation for Speech Enhancement in Reverberation and Noise.

TASLP2016 Nasser Mohammadiha, Simon Doclo
Speech Dereverberation Using Non-Negative Convolutive Transfer Function and Spectro-Temporal Modeling.

TASLP2016 Eugen Rasumow, Martin Hansen, Steven van de Par, Dirk Puschel, Volker Mellert, Simon Doclo, Matthias Blau, 
Regularization Approaches for Synthesizing HRTF Directivity Patterns.

TASLP2016 Henning F. Schepker, Simon Doclo
A Semidefinite Programming Approach to Min-max Estimation of the Common Part of Acoustic Feedback Paths in Hearing Aids.

#182  | Andrew Rosenberg | Google Scholar   DBLP
VenuesInterspeech: 14ICASSP: 10
Years2022: 62021: 22020: 32019: 22018: 32017: 52016: 3
ISCA Sectionself-supervised, semi-supervised, adaptation and data augmentation for asr: 2speech synthesis: 2resource-constrained asr: 1novel models and training methods for asr: 1speech representation: 1self-supervision and semi-supervision for neural asr training: 1asr neural network architectures and training: 1training strategies for asr: 1adjusting to speaker, accent, and domain: 1prosody and text processing: 1behavioral signal processing and speaker state and traits analytics: 1special session: 1
IEEE Keywordspeech recognition: 8natural language processing: 5speech synthesis: 3text analysis: 3speaker recognition: 2data augmentation: 2keyword search: 2automatic speech recognition: 2active learning: 2consistency regularization: 1self supervised: 1speech normalization: 1speech impairments: 1sequence to sequence model: 1voice conversion: 1speech coding: 1entropy: 1language model adaptation: 1code switched automatic speech recognition: 1computational linguistics: 1prosody prediction: 1low resources: 1multi task learning: 1acoustic modeling: 1multi accent speech recognition: 1─ end to end models: 1query processing: 1recurrent neural nets: 1audio coding: 1end to end systems: 1feedforward neural nets: 1attention networks: 1end to end speech recognition: 1ctc: 1low resource languages: 1language model: 1data selection: 1supervised active learning: 1unsupervised learning: 1limited resource automatic speech recognition: 1unsupervised active learning: 1
Most Publications2022: 182017: 122014: 102012: 82018: 7

Affiliations
URLs

ICASSP2022 Zhehuai Chen, Yu Zhang 0033, Andrew Rosenberg, Bhuvana Ramabhadran, Pedro J. Moreno 0001, Gary Wang, 
Tts4pretrain 2.0: Advancing the use of Text and Speech in ASR Pretraining with Consistency and Contrastive Losses.

Interspeech2022 Murali Karthick Baskar, Andrew Rosenberg, Bhuvana Ramabhadran, Yu Zhang 0033, Nicolás Serrano, 
Reducing Domain mismatch in Self-supervised speech pre-training.

Interspeech2022 Fadi Biadsy, Youzheng Chen, Xia Zhang, Oleg Rybakov, Andrew Rosenberg, Pedro J. Moreno 0001, 
A Scalable Model Specialization Framework for Training and Inference using Submodels and its Application to Speech Model Personalization.

Interspeech2022 Zhehuai Chen, Yu Zhang 0033, Andrew Rosenberg, Bhuvana Ramabhadran, Pedro J. Moreno 0001, Ankur Bapna, Heiga Zen, 
MAESTRO: Matched Speech Text Representations through Modality Matching.

Interspeech2022 Cal Peyser, W. Ronny Huang, Andrew Rosenberg, Tara N. Sainath, Michael Picheny, Kyunghyun Cho, 
Towards Disentangled Speech Representations.

Interspeech2022 Gary Wang, Andrew Rosenberg, Bhuvana Ramabhadran, Fadi Biadsy, Jesse Emond, Yinghui Huang, Pedro J. Moreno 0001, 
Non-Parallel Voice Conversion for ASR Augmentation.

ICASSP2021 Rohan Doshi, Youzheng Chen, Liyang Jiang, Xia Zhang, Fadi Biadsy, Bhuvana Ramabhadran, Fang Chu, Andrew Rosenberg, Pedro J. Moreno 0001, 
Extending Parrotron: An End-to-End, Speech Conversion and Speech Recognition Model for Atypical Speech.

Interspeech2021 Zhehuai Chen, Andrew Rosenberg, Yu Zhang 0033, Heiga Zen, Mohammadreza Ghodsi, Yinghui Huang, Jesse Emond, Gary Wang, Bhuvana Ramabhadran, Pedro J. Moreno 0001, 
Semi-Supervision in ASR: Sequential MixMatch and Factorized TTS-Based Augmentation.

ICASSP2020 Gary Wang, Andrew Rosenberg, Zhehuai Chen, Yu Zhang 0033, Bhuvana Ramabhadran, Yonghui Wu, Pedro J. Moreno 0001, 
Improving Speech Recognition Using Consistent Predictions on Synthesized Speech.

Interspeech2020 Zhehuai Chen, Andrew Rosenberg, Yu Zhang 0033, Gary Wang, Bhuvana Ramabhadran, Pedro J. Moreno 0001, 
Improving Speech Recognition Using GAN-Based Speech Synthesis and Contrastive Unspoken Text Selection.

Interspeech2020 Gary Wang, Andrew Rosenberg, Zhehuai Chen, Yu Zhang 0033, Bhuvana Ramabhadran, Pedro J. Moreno 0001, 
SCADA: Stochastic, Consistent and Adversarial Data Augmentation to Improve ASR.

ICASSP2019 Min Ma, Bhuvana Ramabhadran, Jesse Emond, Andrew Rosenberg, Fadi Biadsy, 
Comparison of Data Augmentation and Adaptation Strategies for Code-switched Automatic Speech Recognition.

Interspeech2019 Yu Zhang 0033, Ron J. Weiss, Heiga Zen, Yonghui Wu, Zhifeng Chen, R. J. Skerry-Ryan, Ye Jia, Andrew Rosenberg, Bhuvana Ramabhadran, 
Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning.

ICASSP2018 Andrew Rosenberg, Raul Fernandez, Bhuvana Ramabhadran, 
Measuring the Effect of Linguistic Resources on Prosody Modeling for Speech Synthesis.

ICASSP2018 Xuesong Yang, Kartik Audhkhasi, Andrew Rosenberg, Samuel Thomas 0001, Bhuvana Ramabhadran, Mark Hasegawa-Johnson, 
Joint Modeling of Accents and Acoustics for Multi-Accent Speech Recognition.

Interspeech2018 Takashi Fukuda, Raul Fernandez, Andrew Rosenberg, Samuel Thomas 0001, Bhuvana Ramabhadran, Alexander Sorin, Gakuto Kurata, 
Data Augmentation Improves Recognition of Foreign Accented Speech.

ICASSP2017 Kartik Audhkhasi, Andrew Rosenberg, Abhinav Sethy, Bhuvana Ramabhadran, Brian Kingsbury, 
End-to-end ASR-free keyword search from speech.

ICASSP2017 Andrew Rosenberg, Kartik Audhkhasi, Abhinav Sethy, Bhuvana Ramabhadran, Michael Picheny, 
End-to-end speech recognition and keyword search on low-resource languages.

ICASSP2017 Ali Raza Syed, Andrew Rosenberg, Michael I. Mandel, 
Active learning for low-resource speech recognition: Impact of selection size and language modeling data.

Interspeech2017 Asaf Rendel, Raul Fernandez, Zvi Kons, Andrew Rosenberg, Ron Hoory, Bhuvana Ramabhadran, 
Weakly-Supervised Phrase Assignment from Text in a Speech-Synthesis System Using Noisy Labels.

#183  | Sabato Marco Siniscalchi | Google Scholar   DBLP
VenuesInterspeech: 13ICASSP: 6TASLP: 5
Years2022: 32021: 42020: 72019: 32018: 12017: 42016: 2
ISCA Sectionacoustic scene classification: 2bioacoustics and articulation: 2spoken language processing: 1spoken dialogue systems and multimodality: 1speech signal analysis and representation: 1privacy-preserving machine learning for audio & speech processing: 1multi-channel speech enhancement: 1speech synthesis: 1noise robust and far-field asr: 1speech recognition: 1special session: 1
IEEE Keywordspeech recognition: 6automatic speech recognition: 3natural language processing: 3regression analysis: 3speech enhancement: 3deep neural network: 2computer assisted pronunciation training (capt): 2transfer learning: 2misp challenge: 1audio visual systems: 1wake word spotting: 1public domain software: 1audio visual: 1speaker recognition: 1microphone array: 1estimation theory: 1acoustic to articulatory inversion: 1fbe: 1deep learning (artificial intelligence): 1feedforward neural nets: 1dnn: 1and federated learning: 1acoustic modeling: 1recurrent neural nets: 1data privacy: 1quantum machine learning: 1backpropagation: 1convolutional recurrent neural network: 1deep bottleneck features: 1spoken language recognition: 1speech articulatory attributes: 1maximal figure of merit: 1tensors: 1tensor train network: 1tensor to vector regression: 1non native tone modeling and mispronunciation detection: 1pattern classification: 1computer assisted language learning (call): 1vector to vector regression: 1expressive power: 1function approximation: 1universal approximation: 1probability: 1tone recognition and mispronunciation detection: 1computer assistant language learning (call): 1bayesian learning: 1unsupervised speaker adaptation: 1bayes methods: 1maximum likelihood estimation: 1hidden markov models: 1deep neural networks: 1prior evolution: 1online adaptation: 1model stacking: 1multi task training: 1model compression: 1english corpus: 1i vector system: 1attribute detectors: 1signal representation: 1natural languages: 1finnish corpus: 1linguistics: 1
Most Publications2020: 212022: 152021: 142017: 102013: 9

Affiliations
URLs

ICASSP2022 Hang Chen, Hengshun Zhou, Jun Du, Chin-Hui Lee, Jingdong Chen, Shinji Watanabe 0001, Sabato Marco Siniscalchi, Odette Scharenborg, Diyuan Liu, Bao-Cai Yin, Jia Pan, Jianqing Gao, Cong Liu 0006, 
The First Multimodal Information Based Speech Processing (Misp) Challenge: Data, Tasks, Baselines And Results.

Interspeech2022 Hang Chen, Jun Du, Yusheng Dai, Chin-Hui Lee, Sabato Marco Siniscalchi, Shinji Watanabe 0001, Odette Scharenborg, Jingdong Chen, Baocai Yin, Jia Pan, 
Audio-Visual Speech Recognition in MISP2021 Challenge: Dataset Release and Deep Analysis.

Interspeech2022 Hengshun Zhou, Jun Du, Gongzhen Zou, Zhaoxu Nian, Chin-Hui Lee, Sabato Marco Siniscalchi, Shinji Watanabe 0001, Odette Scharenborg, Jingdong Chen, Shifu Xiong, Jianqing Gao, 
Audio-Visual Wake Word Spotting in MISP2021 Challenge: Dataset Release and Deep Analysis.

ICASSP2021 Abdolreza Sabzi Shahrebabaki, Negar Olfati, Ali Shariq Imran, Magne Hallstein Johnsen, Sabato Marco Siniscalchi, Torbjørn Svendsen, 
A Two-Stage Deep Modeling Approach to Articulatory Inversion.

ICASSP2021 Chao-Han Huck Yang, Jun Qi 0002, Samuel Yen-Chi Chen, Pin-Yu Chen, Sabato Marco Siniscalchi, Xiaoli Ma, Chin-Hui Lee, 
Decentralizing Feature Extraction with Quantum Convolutional Neural Network for Automatic Speech Recognition.

Interspeech2021 Abdolreza Sabzi Shahrebabaki, Sabato Marco Siniscalchi, Torbjørn Svendsen, 
Raw Speech-to-Articulatory Inversion by Temporal Filtering and Decimation.

Interspeech2021 Chao-Han Huck Yang, Sabato Marco Siniscalchi, Chin-Hui Lee, 
PATE-AAE: Incorporating Adversarial Autoencoder into Private Aggregation of Teacher Ensembles for Spoken Command Classification.

TASLP2020 Ivan Kukanov, Trung Ngo Trong, Ville Hautamäki, Sabato Marco Siniscalchi, Valerio Mario Salerno, Kong Aik Lee, 
Maximal Figure-of-Merit Framework to Detect Multi-Label Phonetic Features for Spoken Language Recognition.

ICASSP2020 Jun Qi 0002, Hu Hu, Yannan Wang, Chao-Han Huck Yang, Sabato Marco Siniscalchi, Chin-Hui Lee, 
Tensor-To-Vector Regression for Multi-Channel Speech Enhancement Based on Tensor-Train Network.

Interspeech2020 Hu Hu, Sabato Marco Siniscalchi, Yannan Wang, Xue Bai, Jun Du, Chin-Hui Lee, 
An Acoustic Segment Model Based Segment Unit Selection Approach to Acoustic Scene Classification with Partial Utterances.

Interspeech2020 Hu Hu, Sabato Marco Siniscalchi, Yannan Wang, Chin-Hui Lee, 
Relational Teacher Student Learning with Neural Label Embedding for Device Adaptation in Acoustic Scene Classification.

Interspeech2020 Jun Qi 0002, Hu Hu, Yannan Wang, Chao-Han Huck Yang, Sabato Marco Siniscalchi, Chin-Hui Lee, 
Exploring Deep Hybrid Tensor-to-Vector Network Architectures for Regression Based Speech Enhancement.

Interspeech2020 Abdolreza Sabzi Shahrebabaki, Negar Olfati, Sabato Marco Siniscalchi, Giampiero Salvi, Torbjørn Svendsen, 
Transfer Learning of Articulatory Information Through Phone Information.

Interspeech2020 Abdolreza Sabzi Shahrebabaki, Sabato Marco Siniscalchi, Giampiero Salvi, Torbjørn Svendsen, 
Sequence-to-Sequence Articulatory Inversion Through Time Convolution of Sub-Band Frequency Signals.

TASLP2019 Wei Li 0119, Nancy F. Chen, Sabato Marco Siniscalchi, Chin-Hui Lee, 
Improving Mispronunciation Detection of Mandarin Tones for Non-Native Learners With Soft-Target Tone Labels and BLSTM-Based Deep Tone Models.

TASLP2019 Jun Qi 0002, Jun Du, Sabato Marco Siniscalchi, Chin-Hui Lee, 
A Theory on Deep Neural Network Based Vector-to-Vector Regression With an Illustration of Its Expressive Power in Speech Enhancement.

Interspeech2019 Abdolreza Sabzi Shahrebabaki, Negar Olfati, Ali Shariq Imran, Sabato Marco Siniscalchi, Torbjørn Svendsen, 
A Phonetic-Level Analysis of Different Input Features for Articulatory Inversion.

ICASSP2018 Wei Li 0119, Nancy F. Chen, Sabato Marco Siniscalchi, Chin-Hui Lee, 
Improving Mandarin Tone Mispronunciation Detection for Non-Native Learners with Soft-Target Tone Labels and BLSTM-Based Deep Models.

TASLP2017 Zhen Huang 0001, Sabato Marco Siniscalchi, Chin-Hui Lee, 
Bayesian Unsupervised Batch and Online Speaker Adaptation of Activation Function Parameters in Deep Models for Automatic Speech Recognition.

ICASSP2017 Sicheng Wang, Kehuang Li, Zhen Huang 0001, Sabato Marco Siniscalchi, Chin-Hui Lee, 
A transfer learning and progressive stacking approach to reducing deep model sizes with an application to speech enhancement.

#184  | Shi-Xiong Zhang | Google Scholar   DBLP
VenuesInterspeech: 12ICASSP: 9TASLP: 3
Years2022: 32021: 92020: 62019: 42018: 12016: 1
ISCA Sectionsource separation, dereverberation and echo cancellation: 2dereverberation and echo cancellation: 1speaker recognition: 1source separation: 1speech localization, enhancement, and quality assessment: 1topics in asr: 1multi-channel speech enhancement: 1multimodal speech processing: 1speech and audio source separation and scene analysis: 1speech enhancement: 1asr for noisy and far-field speech: 1
IEEE Keywordspeech recognition: 11speech separation: 4speech enhancement: 3source separation: 3application program interfaces: 2reverberation: 2audio visual systems: 2filtering theory: 2microphone arrays: 2end to end speech recognition: 2speaker recognition: 2support vector machines: 2acoustic environment: 1speech simulation: 1graphics processing units: 1transient response: 1computational linguistics: 1code switched asr: 1bilingual asr: 1natural language processing: 1rnn t: 1audio visual processing: 1sensor fusion: 1speech synthesis: 1audio signal processing: 1sound source separation: 1multi channel: 1audio visual: 1jointly fine tuning: 1visual occlusion: 1overlapped speech recognition: 1image recognition: 1video signal processing: 1adl mvdr: 1array signal processing: 1mvdr: 1source localization: 1direction of arrival estimation: 1multi channel speech separation: 1spatial features: 1end to end: 1spatial filters: 1inter channel convolution differences: 1target speech extraction: 1signal reconstruction: 1minimisation: 1neural beamformer: 1overlapped speech: 1audio visual speech recognition: 1multi modal: 1speech coding: 1decoding: 1privacy preserving: 1dnn: 1cloud computing: 1quantization: 1encryption: 1polynomials: 1cryptography: 1probability: 1text dependent speaker verification: 1dynamic time warping: 1text analysis: 1sequential speaker characteristics: 1speaker supervector: 1recurrent neural nets: 1lstm: 1sequence training: 1maximum margin: 1svm: 1
Most Publications2021: 242020: 192019: 122022: 102023: 3

Affiliations
URLs

ICASSP2022 Anton Ratnarajah, Shi-Xiong Zhang, Meng Yu 0003, Zhenyu Tang 0001, Dinesh Manocha, Dong Yu 0001, 
Fast-Rir: Fast Neural Diffuse Room Impulse Response Generator.

ICASSP2022 Brian Yan, Chunlei Zhang, Meng Yu 0003, Shi-Xiong Zhang, Siddharth Dalmia, Dan Berrebbi, Chao Weng, Shinji Watanabe 0001, Dong Yu 0001, 
Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization.

Interspeech2022 Vinay Kothapally, Yong Xu 0004, Meng Yu 0003, Shi-Xiong Zhang, Dong Yu 0001, 
Joint Neural AEC and Beamforming with Double-Talk Detection.

TASLP2021 Daniel Michelsanti, Zheng-Hua Tan, Shi-Xiong Zhang, Yong Xu 0004, Meng Yu 0003, Dong Yu 0001, Jesper Jensen 0001, 
An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation.

TASLP2021 Jianwei Yu, Shi-Xiong Zhang, Bo Wu, Shansong Liu, Shoukang Hu, Mengzhe Geng, Xunying Liu, Helen Meng, Dong Yu 0001, 
Audio-Visual Multi-Channel Integration and Recognition of Overlapped Speech.

TASLP2021 Zhuohuang Zhang, Yong Xu 0004, Meng Yu 0003, Shi-Xiong Zhang, Lianwu Chen, Donald S. Williamson, Dong Yu 0001, 
Multi-Channel Multi-Frame ADL-MVDR for Target Speech Separation.

ICASSP2021 Aswin Shanmugam Subramanian, Chao Weng, Shinji Watanabe 0001, Meng Yu 0003, Yong Xu 0004, Shi-Xiong Zhang, Dong Yu 0001, 
Directional ASR: A New Paradigm for E2E Multi-Speaker Speech Recognition with Source Localization.

Interspeech2021 Saurabh Kataria, Shi-Xiong Zhang, Dong Yu 0001, 
Multi-Channel Speaker Verification for Single and Multi-Talker Speech.

Interspeech2021 Xiyun Li, Yong Xu 0004, Meng Yu 0003, Shi-Xiong Zhang, Jiaming Xu 0001, Bo Xu 0002, Dong Yu 0001, 
MIMO Self-Attentive RNN Beamformer for Multi-Speaker Speech Separation.

Interspeech2021 Helin Wang, Bo Wu, Lianwu Chen, Meng Yu 0003, Jianwei Yu, Yong Xu 0004, Shi-Xiong Zhang, Chao Weng, Dan Su 0002, Dong Yu 0001, 
TeCANet: Temporal-Contextual Attention Network for Environment-Aware Speech Dereverberation.

Interspeech2021 Yong Xu 0004, Zhuohuang Zhang, Meng Yu 0003, Shi-Xiong Zhang, Dong Yu 0001, 
Generalized Spatio-Temporal RNN Beamformer for Target Speech Separation.

Interspeech2021 Meng Yu 0003, Chunlei Zhang, Yong Xu 0004, Shi-Xiong Zhang, Dong Yu 0001, 
MetricNet: Towards Improved Modeling For Non-Intrusive Speech Quality Assessment.

ICASSP2020 Rongzhi Gu, Shi-Xiong Zhang, Lianwu Chen, Yong Xu 0004, Meng Yu 0003, Dan Su 0002, Yuexian Zou, Dong Yu 0001, 
Enhancing End-to-End Multi-Channel Speech Separation Via Spatial Feature Learning.

ICASSP2020 Aswin Shanmugam Subramanian, Chao Weng, Meng Yu 0003, Shi-Xiong Zhang, Yong Xu 0004, Shinji Watanabe 0001, Dong Yu 0001, 
Far-Field Location Guided Target Speech Extraction Using End-to-End Speech Recognition Objectives.

ICASSP2020 Jianwei Yu, Shi-Xiong Zhang, Jian Wu 0027, Shahram Ghorbani, Bo Wu, Shiyin Kang, Shansong Liu, Xunying Liu, Helen Meng, Dong Yu 0001, 
Audio-Visual Recognition of Overlapped Speech for the LRS2 Dataset.

Interspeech2020 Shansong Liu, Xurong Xie, Jianwei Yu, Shoukang Hu, Mengzhe Geng, Rongfeng Su, Shi-Xiong Zhang, Xunying Liu, Helen Meng, 
Exploiting Cross-Domain Visual Feature Generation for Disordered Speech Recognition.

Interspeech2020 Yong Xu 0004, Meng Yu 0003, Shi-Xiong Zhang, Lianwu Chen, Chao Weng, Jianming Liu, Dong Yu 0001, 
Neural Spatio-Temporal Beamformer for Target Speech Separation.

Interspeech2020 Jianwei Yu, Bo Wu, Rongzhi Gu, Shi-Xiong Zhang, Lianwu Chen, Yong Xu 0004, Meng Yu 0003, Dan Su 0002, Dong Yu 0001, Xunying Liu, Helen Meng, 
Audio-Visual Multi-Channel Recognition of Overlapped Speech.

ICASSP2019 Shi-Xiong Zhang, Yifan Gong 0001, Dong Yu 0001, 
Encrypted Speech Recognition Using Deep Polynomial Networks.

Interspeech2019 Fahimeh Bahmaninezhad, Jian Wu 0027, Rongzhi Gu, Shi-Xiong Zhang, Yong Xu 0004, Meng Yu 0003, Dong Yu 0001, 
A Comprehensive Study of Speech Separation: Spectrogram vs Waveform Separation.

#185  | Lei He 0005 | Google Scholar   DBLP
VenuesInterspeech: 15ICASSP: 8SpeechComm: 1
Years2022: 82021: 22020: 62019: 52018: 12016: 2
ISCA Sectionspeech synthesis: 10voice conversion and adaptation: 1self-supervision and semi-supervision for neural asr training: 1acoustic model adaptation for asr: 1asr neural network architectures and training: 1voice conversion and speech synthesis: 1
IEEE Keywordspeech synthesis: 7speech recognition: 3text analysis: 3text to speech: 2prosody: 2natural language processing: 2speaker recognition: 2unsupervised learning: 2speaker adaptation: 2iterative methods: 1optimisation: 1probability: 1fast sampling: 1image denoising: 1vocoder: 1denoising diffusion probabilistic models: 1vocoders: 1medical image processing: 1emotion recognition: 1speech coding: 1mist: 1few shot: 1speech codecs: 1tts: 1attention: 1domain adaptation: 1machine speech chain: 1acoustic model adaptation: 1neural language generation: 1keyword spotting: 1recurrent neural nets: 1adaptation: 1rnn t: 1bert: 1neural tts: 1style transfer: 1variational autoencoder: 1knowledge representation: 1deep neural networks: 1statistical parametric speech synthesis: 1
Most Publications2022: 182021: 132020: 102019: 102023: 5

Affiliations
Microsoft China, Speech and Language Group, Beijing, China

ICASSP2022 Zehua Chen, Xu Tan 0003, Ke Wang, Shifeng Pan, Danilo P. Mandic, Lei He 0005, Sheng Zhao, 
Infergrad: Improving Diffusion Models for Vocoder by Considering Inference in Training.

ICASSP2022 Yuanhao Yi, Lei He 0005, Shifeng Pan, Xi Wang 0016, Yujia Xiao, 
Prosodyspeech: Towards Advanced Prosody Model for Neural Text-to-Speech.

ICASSP2022 Fengpeng Yue, Yan Deng, Lei He 0005, Tom Ko, Yu Zhang 0006, 
Exploring Machine Speech Chain For Domain Adaptation.

Interspeech2022 Mutian He 0001, Jingzhou Yang, Lei He 0005, Frank K. Soong, 
Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge.

Interspeech2022 Yanqing Liu, Ruiqing Xue, Lei He 0005, Xu Tan 0003, Sheng Zhao, 
DelightfulTTS 2: End-to-End Speech Synthesis with Adversarial Vector-Quantized Auto-Encoders.

Interspeech2022 Yihan Wu, Xu Tan 0003, Bohan Li 0003, Lei He 0005, Sheng Zhao, Ruihua Song, Tao Qin, Tie-Yan Liu, 
AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios.

Interspeech2022 Yihan Wu, Xi Wang 0016, Shaofei Zhang, Lei He 0005, Ruihua Song, Jian-Yun Nie, 
Self-supervised Context-aware Style Representation for Expressive Speech Synthesis.

Interspeech2022 Yuanhao Yi, Lei He 0005, Shifeng Pan, Xi Wang 0016, Yuchao Zhang, 
SoftSpeech: Unsupervised Duration Model in FastSpeech 2.

Interspeech2021 Yan Deng, Rui Zhao 0017, Zhong Meng, Xie Chen 0001, Bing Liu, Jinyu Li 0001, Yifan Gong 0001, Lei He 0005
Improving RNN-T for Domain Scaling Using Semi-Supervised Training with Neural TTS.

Interspeech2021 Shifeng Pan, Lei He 0005
Cross-Speaker Style Transfer with Prosody Bottleneck in Neural Speech Synthesis.

ICASSP2020 Yan Huang 0028, Lei He 0005, Wenning Wei, William Gale, Jinyu Li 0001, Yifan Gong 0001, 
Using Personalized Speech Synthesis and Neural Language Generator for Rapid Speaker Adaptation.

ICASSP2020 Eva Sharma, Guoli Ye, Wenning Wei, Rui Zhao 0017, Yao Tian, Jian Wu 0027, Lei He 0005, Ed Lin, Yifan Gong 0001, 
Adaptation of RNN Transducer with Text-To-Speech Technology for Keyword Spotting.

ICASSP2020 Yujia Xiao, Lei He 0005, Huaiping Ming, Frank K. Soong, 
Improving Prosody with Linguistic and Bert Derived Features in Multi-Speaker Based Mandarin Chinese Neural TTS.

Interspeech2020 Yan Huang 0028, Jinyu Li 0001, Lei He 0005, Wenning Wei, William Gale, Yifan Gong 0001, 
Rapid RNN-T Adaptation Using Personalized Speech Synthesis and Neural Language Generator.

Interspeech2020 Yang Cui, Xi Wang 0016, Lei He 0005, Frank K. Soong, 
An Efficient Subband Linear Prediction for LPCNet-Based Neural Synthesis.

Interspeech2020 Jinyu Li 0001, Rui Zhao 0017, Zhong Meng, Yanqing Liu, Wenning Wei, Sarangarajan Parthasarathy, Vadim Mazalov, Zhenghao Wang, Lei He 0005, Sheng Zhao, Yifan Gong 0001, 
Developing RNN-T Models Surpassing High-Performance Hybrid Models with Customization Capability.

ICASSP2019 Yajie Zhang, Shifeng Pan, Lei He 0005, Zhen-Hua Ling, 
Learning Latent Representations for Style Control and Transfer in End-to-end Speech Synthesis.

Interspeech2019 Haohan Guo, Frank K. Soong, Lei He 0005, Lei Xie 0001, 
A New GAN-Based End-to-End TTS Training Algorithm.

Interspeech2019 Haohan Guo, Frank K. Soong, Lei He 0005, Lei Xie 0001, 
Exploiting Syntactic Features in a Parsed Tree to Improve End-to-End TTS.

Interspeech2019 Mutian He 0001, Yan Deng, Lei He 0005
Robust Sequence-to-Sequence Acoustic Modeling with Stepwise Monotonic Attention for Neural TTS.

#186  | Xuankai Chang | Google Scholar   DBLP
VenuesInterspeech: 15ICASSP: 6ACL: 1TASLP: 1SpeechComm: 1
Years2022: 82021: 52020: 52019: 12018: 32017: 12016: 1
ISCA Sectionnon-autoregressive sequential modeling for speech processing: 2spoken language understanding: 1robust asr, and far-field/multi-talker asr: 1speech enhancement and intelligibility: 1speech synthesis: 1low-resource speech recognition: 1speech signal analysis and representation: 1asr neural network architectures and training: 1neural networks for language modeling: 1noise robust and distant speech recognition: 1asr neural network training: 1robust speech recognition: 1noise robust speech recognition: 1spoken term detection: 1
IEEE Keywordspeech recognition: 6natural language processing: 3transformer: 2permutation invariant training: 2recurrent neural nets: 2language translation: 1spoken language understanding: 1public domain software: 1open source: 1text analysis: 1speech based user interfaces: 1pattern classification: 1multi speaker overlapped speech: 1end to end asr: 1graph theory: 1gtc: 1wfst: 1ctc: 1bic: 1acoustic unit discovery: 1unit based language model: 1hubert: 1interactive systems: 1self supervised learning: 1end to end speech processing: 1conformer: 1speaker recognition: 1end to end model: 1multi talker mixed speech recognition: 1knowledge distillation: 1curriculum learning: 1decoding: 1end to end: 1speech separation: 1neural beamforming: 1overlapped speech recognition: 1reverberation: 1multi talker speech recognition: 1auxiliary features: 1speaker adaptation: 1
Most Publications2022: 282021: 172020: 142023: 72018: 6

Affiliations
URLs

ICASSP2022 Siddhant Arora, Siddharth Dalmia, Pavel Denisov, Xuankai Chang, Yushi Ueda, Yifan Peng, Yuekai Zhang, Sujay Kumar, Karthik Ganesan 0003, Brian Yan, Ngoc Thang Vu, Alan W. Black, Shinji Watanabe 0001, 
ESPnet-SLU: Advancing Spoken Language Understanding Through ESPnet.

ICASSP2022 Xuankai Chang, Niko Moritz, Takaaki Hori, Shinji Watanabe 0001, Jonathan Le Roux, 
Extended Graph Temporal Classification for Multi-Speaker End-to-End ASR.

ICASSP2022 Takashi Maekaku, Xuankai Chang, Yuya Fujita, Shinji Watanabe 0001, 
An Exploration of Hubert with Large Number of Cluster Units and Model Assessment Using Bayesian Information Criterion.

Interspeech2022 Siddhant Arora, Siddharth Dalmia, Xuankai Chang, Brian Yan, Alan W. Black, Shinji Watanabe 0001, 
Two-Pass Low Latency End-to-End Spoken Language Understanding.

Interspeech2022 Xuankai Chang, Takashi Maekaku, Yuya Fujita, Shinji Watanabe 0001, 
End-to-End Integration of Speech Recognition, Speech Enhancement, and Self-Supervised Learning Representation.

Interspeech2022 Yen-Ju Lu, Xuankai Chang, Chenda Li, Wangyou Zhang, Samuele Cornell, Zhaoheng Ni, Yoshiki Masuyama, Brian Yan, Robin Scheibler, Zhong-Qiu Wang, Yu Tsao 0001, Yanmin Qian, Shinji Watanabe 0001, 
ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding.

Interspeech2022 Jiatong Shi, Shuai Guo, Tao Qian, Tomoki Hayashi, Yuning Wu, Fangzheng Xu, Xuankai Chang, Huazhe Li, Peter Wu, Shinji Watanabe 0001, Qin Jin, 
Muskits: an End-to-end Music Processing Toolkit for Singing Voice Synthesis.

ACL2022 Hsiang-Sheng Tsai, Heng-Jui Chang, Wen-Chin Huang, Zili Huang, Kushal Lakhotia, Shu-Wen Yang, Shuyan Dong, Andy T. Liu, Cheng-I Lai, Jiatong Shi, Xuankai Chang, Phil Hall, Hsuan-Jui Chen, Shang-Wen Li 0001, Shinji Watanabe 0001, Abdelrahman Mohamed, Hung-yi Lee, 
SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities.

ICASSP2021 Pengcheng Guo, Florian Boyer, Xuankai Chang, Tomoki Hayashi, Yosuke Higuchi, Hirofumi Inaguma, Naoyuki Kamo, Chenda Li, Daniel Garcia-Romero, Jiatong Shi, Jing Shi 0003, Shinji Watanabe 0001, Kun Wei, Wangyou Zhang, Yuekai Zhang, 
Recent Developments on Espnet Toolkit Boosted By Conformer.

Interspeech2021 Pengcheng Guo, Xuankai Chang, Shinji Watanabe 0001, Lei Xie 0001, 
Multi-Speaker ASR Combining Non-Autoregressive Conformer CTC and Conditional Speaker Chain.

Interspeech2021 Takashi Maekaku, Xuankai Chang, Yuya Fujita, Li-Wei Chen, Shinji Watanabe 0001, Alexander I. Rudnicky, 
Speech Representation Learning Combining Conformer CPC with Deep Cluster for the ZeroSpeech Challenge 2021.

Interspeech2021 Tianzi Wang, Yuya Fujita, Xuankai Chang, Shinji Watanabe 0001, 
Streaming End-to-End ASR Based on Blockwise Non-Autoregressive Models.

Interspeech2021 Shu-Wen Yang, Po-Han Chi, Yung-Sung Chuang, Cheng-I Jeff Lai, Kushal Lakhotia, Yist Y. Lin, Andy T. Liu, Jiatong Shi, Xuankai Chang, Guan-Ting Lin, Tzu-Hsien Huang, Wei-Cheng Tseng, Ko-tik Lee, Da-Rong Liu, Zili Huang, Shuyan Dong, Shang-Wen Li 0001, Shinji Watanabe 0001, Abdelrahman Mohamed, Hung-yi Lee, 
SUPERB: Speech Processing Universal PERformance Benchmark.

TASLP2020 Wangyou Zhang, Xuankai Chang, Yanmin Qian, Shinji Watanabe 0001, 
Improving End-to-End Single-Channel Multi-Talker Speech Recognition.

ICASSP2020 Xuankai Chang, Wangyou Zhang, Yanmin Qian, Jonathan Le Roux, Shinji Watanabe 0001, 
End-To-End Multi-Speaker Speech Recognition With Transformer.

Interspeech2020 Xuankai Chang, Aswin Shanmugam Subramanian, Pengcheng Guo, Shinji Watanabe 0001, Yuya Fujita, Motoi Omachi, 
End-to-End ASR with Adaptive Span Self-Attention.

Interspeech2020 Yuya Fujita, Shinji Watanabe 0001, Motoi Omachi, Xuankai Chang
Insertion-Based Modeling for End-to-End Automatic Speech Recognition.

Interspeech2020 Wangyou Zhang, Aswin Shanmugam Subramanian, Xuankai Chang, Shinji Watanabe 0001, Yanmin Qian, 
End-to-End Far-Field Speech Recognition with Unified Dereverberation and Beamforming.

Interspeech2019 Wangyou Zhang, Xuankai Chang, Yanmin Qian, 
Knowledge Distillation for End-to-End Monaural Multi-Talker ASR System.

SpeechComm2018 Yanmin Qian, Xuankai Chang, Dong Yu 0001, 
Single-channel multi-talker speech recognition with permutation invariant training.

#187  | Chao Zhang 0031 | Google Scholar   DBLP
VenuesICASSP: 12Interspeech: 11SpeechComm: 1
Years2023: 12022: 22021: 52020: 32019: 42018: 42017: 12016: 4
ISCA Sectionneural transducers, streaming asr and novel asr models: 1robust asr, and far-field/multi-talker asr: 1neural network training methods for asr: 1speech synthesis: 1the interspeech 2020 far field speaker verification challenge: 1spatial audio: 1asr neural network architectures: 1speech and audio source separation and scene analysis: 1acoustic model adaptation: 1novel neural network architectures for acoustic modelling: 1new products and services: 1
IEEE Keywordspeech recognition: 9speaker recognition: 3d vector: 2deep neural network: 2transformer: 1lstm: 1language models: 1natural language processing: 1cross utterance: 1content aware speaker embedding: 1distributed representation: 1diarisation: 1emotion recognition: 1speech enhancement: 1kalman filtering: 1backpropagation: 1kalman filters: 1self attention: 1model combination: 1speaker diarization: 1hidden markov models: 1python: 1convolution: 1delays: 1time delay neural network: 1resnet: 1grid recurrent neural network: 1feedforward neural nets: 1recurrent neural nets: 1optimisation: 1probability: 1mixture models: 1gaussian processes: 1audio segmentation: 1television broadcasting: 1pattern clustering: 1audio signal processing: 1multi genre broadcast data: 1error analysis: 1speech coding: 1log linear model: 1hybrid system: 1joint decoding: 1tandem system: 1structured svm: 1
Most Publications2021: 152022: 112019: 112020: 102015: 8

Affiliations
University of Cambridge, Department of Engineering, UK
URLs

SpeechComm2023 Qiujia Li, Chao Zhang 0031, Philip C. Woodland, 
Combining hybrid DNN-HMM ASR systems with attention-based models using lattice rescoring.

Interspeech2022 Guangzhi Sun, Chao Zhang 0031, Philip C. Woodland, 
Tree-constrained Pointer Generator with Graph Neural Network Encodings for Contextual Speech Recognition.

Interspeech2022 Xianrui Zheng, Chao Zhang 0031, Philip C. Woodland, 
Tandem Multitask Training of Speaker Diarisation and Speech Recognition for Meeting Transcription.

ICASSP2021 Guangzhi Sun, Chao Zhang 0031, Philip C. Woodland, 
Transformer Language Models with LSTM-Based Cross-Utterance Information Representation.

ICASSP2021 Guangzhi Sun, D. Liu, Chao Zhang 0031, Philip C. Woodland, 
Content-Aware Speaker Embeddings for Speaker Diarisation.

ICASSP2021 Wen Wu, Chao Zhang 0031, Philip C. Woodland, 
Emotion Recognition by Fusing Time Synchronous and Time Asynchronous Representations.

ICASSP2021 Wei Xue, Gang Quan, Chao Zhang 0031, Guohong Ding, Xiaodong He 0001, Bowen Zhou, 
Neural Kalman Filtering for Speech Enhancement.

Interspeech2021 Dongcheng Jiang, Chao Zhang 0031, Philip C. Woodland, 
Variable Frame Rate Acoustic Models Using Minimum Error Reinforcement Learning.

Interspeech2020 Wei Song, Guanghui Xu, Zhengchen Zhang, Chao Zhang 0031, Xiaodong He 0001, Bowen Zhou, 
Efficient WaveGlow: An Improved WaveGlow Vocoder with Enhanced Speed.

Interspeech2020 Ying Tong, Wei Xue, Shanluo Huang, Lu Fan, Chao Zhang 0031, Guohong Ding, Xiaodong He 0001, 
The JD AI Speaker Verification System for the FFSVC 2020 Challenge.

Interspeech2020 Wei Xue, Ying Tong, Chao Zhang 0031, Guohong Ding, Xiaodong He 0001, Bowen Zhou, 
Sound Event Localization and Detection Based on Multiple DOA Beamforming and Multi-Task Learning.

ICASSP2019 Guangzhi Sun, Chao Zhang 0031, Philip C. Woodland, 
Speaker Diarisation Using 2D Self-attentive Combination of Embeddings.

ICASSP2019 Chao Zhang 0031, Florian L. Kreyssig, Qiujia Li, Philip C. Woodland, 
PyHTK: Python Library and ASR Pipelines for HTK.

Interspeech2019 Patrick von Platen, Chao Zhang 0031, Philip C. Woodland, 
Multi-Span Acoustic Modelling Using Raw Waveform Signals.

Interspeech2019 Wei Xue, Ying Tong, Guohong Ding, Chao Zhang 0031, Tao Ma, Xiaodong He 0001, Bowen Zhou, 
Direct-Path Signal Cross-Correlation Estimation for Sound Source Localization in Reverberation.

ICASSP2018 Florian L. Kreyssig, Chao Zhang 0031, Philip C. Woodland, 
Improved Tdnns Using Deep Kernels and Frequency Dependent Grid-RNNS.

ICASSP2018 Chao Zhang 0031, Philip C. Woodland, 
High Order Recurrent Neural Networks for Acoustic Modelling.

Interspeech2018 Yu Wang 0027, Chao Zhang 0031, Mark J. F. Gales, Philip C. Woodland, 
Speaker Adaptation and Adaptive Training for Jointly Optimised Tandem Systems.

Interspeech2018 Chao Zhang 0031, Philip C. Woodland, 
Semi-tied Units for Efficient Gating in LSTM and Highway Networks.

ICASSP2017 Chao Zhang 0031, Philip C. Woodland, 
Joint optimisation of tandem systems using Gaussian mixture density neural network discriminative sequence training.

#188  | Timo Gerkmann | Google Scholar   DBLP
VenuesInterspeech: 9ICASSP: 8TASLP: 6NeurIPS: 1
Years2022: 72021: 32020: 32019: 32018: 32017: 22016: 3
ISCA Sectionsingle-channel and multi-channel speech enhancement: 3speech enhancement: 2dereverberation, noise reduction, and speaker extraction: 1(multimodal) speech emotion recognition: 1speech and audio source separation and scene analysis: 1speech-enhancement: 1
IEEE Keywordspeech enhancement: 12wiener filters: 4noise reduction: 4filtering theory: 3least mean squares methods: 3bayes methods: 3time frequency analysis: 2end to end learning: 2speech recognition: 2gaussian noise: 2array signal processing: 2spatial filters: 2gaussian distribution: 2nonlinear filters: 2multichannel: 2uncertainty: 2spectral analysis: 2power spectral density: 2signal reconstruction: 2fourier transforms: 2maximum likelihood estimation: 1uncertainty estimation: 1wiener filter: 1deep neural network: 1bayesian estimator: 1prediction theory: 1online algorithm: 1dereverberation: 1hearing devices: 1neural network: 1nonlinear spatial filtering: 1microphone arrays: 1estimation theory: 1signal classification: 1semi supervised learning: 1variational autoencoder: 1deep generative model: 1statistical distributions: 1hearing: 1tasnet: 1speech separation: 1audio coding: 1auditory filterbank: 1channel bank filters: 1nonlinear filtering: 1acoustic beamforming: 1deep neural networks: 1input features: 1stochastic processes: 1generalization: 1super gaussian pdf: 1matrix decomposition: 1nonnegative matrix factorization: 1nonlinear estimation: 1distortion: 1amplitude estimation: 1recursive estimation: 1smoothing methods: 1error correction: 1adaptive estimation: 1iir filters: 1computational complexity: 1random processes: 1reliability: 1instrumental measures: 1level measurement: 1perceptual evaluation: 1reverberation: 1
Most Publications2022: 322021: 192023: 172015: 152014: 14


ICASSP2022 Huajian Fang, Tal Peer, Stefan Wermter, Timo Gerkmann
Integrating Statistical Uncertainty into Neural Network-Based Speech Enhancement.

ICASSP2022 Jean-Marie Lemercier, Joachim Thiemann, Raphael Koning, Timo Gerkmann
Customizable End-To-End Optimization Of Online Neural Network-Supported Dereverberation For Hearing Devices.

Interspeech2022 Jean-Marie Lemercier, Joachim Thiemann, Raphael Koning, Timo Gerkmann
Neural Network-augmented Kalman Filtering for Robust Online Speech Dereverberation in Noisy Reverberant Environments.

Interspeech2022 Danilo de Oliveira, Tal Peer, Timo Gerkmann
Efficient Transformer-based Speech Enhancement Using Long Frames and STFT Magnitudes.

Interspeech2022 Navin Raj Prabhu, Guillaume Carbajal, Nale Lehmann-Willenbrock, Timo Gerkmann
End-To-End Label Uncertainty Modeling for Speech-based Arousal Recognition Using Bayesian Neural Networks.

Interspeech2022 Kristina Tesch, Nils-Hendrik Mohrmann, Timo Gerkmann
On the Role of Spatial, Spectral, and Temporal Processing for DNN-based Non-linear Multi-channel Speech Enhancement.

Interspeech2022 Simon Welker, Julius Richter, Timo Gerkmann
Speech Enhancement with Score-Based Generative Models in the Complex STFT Domain.

TASLP2021 Kristina Tesch, Timo Gerkmann
Nonlinear Spatial Filtering in Multichannel Speech Enhancement.

ICASSP2021 Guillaume Carbajal, Julius Richter, Timo Gerkmann
Guided Variational Autoencoder for Speech Enhancement with a Supervised Classifier.

NeurIPS2021 Xiaolin Hu 0001, Kai Li, Weiyi Zhang, Yi Luo, Jean-Marie Lemercier, Timo Gerkmann
Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network.

ICASSP2020 David Ditter, Timo Gerkmann
A Multi-Phase Gammatone Filterbank for Speech Separation Via Tasnet.

ICASSP2020 Kristina Tesch, Timo Gerkmann
Nonlinear Spatial Filtering for Multichannel Speech Enhancement in Inhomogeneous Noise Fields.

Interspeech2020 Julius Richter, Guillaume Carbajal, Timo Gerkmann
Speech Enhancement with Stochastic Temporal Convolutional Networks.

ICASSP2019 Robert Rehr, Timo Gerkmann
An Analysis of Noise-aware Features in Combination with the Size and Diversity of Training Data for DNN-based Speech Enhancement.

Interspeech2019 David Ditter, Timo Gerkmann
Influence of Speaker-Specific Parameters on Speech Separation Systems.

Interspeech2019 Kristina Tesch, Robert Rehr, Timo Gerkmann
On Nonlinear Spatial Filtering in Multichannel Speech Enhancement.

TASLP2018 Martin Krawczyk-Becker, Timo Gerkmann
On Speech Enhancement Under PSD Uncertainty.

TASLP2018 Robert Rehr, Timo Gerkmann
On the Importance of Super-Gaussian Speech Priors for Machine-Learning Based Speech Enhancement.

ICASSP2018 Martin Krawczyk-Becker, Timo Gerkmann
Nonlinear Speech Enhancement Under Speech PSD Uncertainty.

TASLP2017 Robert Rehr, Timo Gerkmann
An Analysis of Adaptive Recursive Smoothing with Applications to Noise PSD Estimation.

#189  | Daisuke Saito | Google Scholar   DBLP
VenuesInterspeech: 19TASLP: 4ICASSP: 1
Years2022: 52021: 12020: 42019: 22018: 22017: 42016: 6
ISCA Sectionspeech synthesis: 4applications in transcription, education and learning: 2voice conversion and adaptation: 2special session: 2spoken language evaluatiosn: 1language learning and databases: 1applications in education and learning: 1multimodal dialogue systems: 1voice conversion: 1search, computational strategies and language modeling: 1models of speech production: 1speech recognition for language learning: 1articulatory measurements and analysis: 1
IEEE Keywordspeaker recognition: 3speech synthesis: 2source separation: 2voice activity detection: 1deep learning (artificial intelligence): 1a small amount of data: 1vocal tract length normalization: 1voice conversion (vc): 1spectral differentials: 1speech recognition: 1music information processing: 1music information retrieval: 1unison singing: 1audio signal processing: 1music: 1singer diarization: 1discriminability: 1monaural source separation: 1matrix decomposition: 1discriminative nmf: 1minimum volume nmf: 1many to many conversion: 1gaussian processes: 1voice conversion: 1mixture models: 1deep neural network: 1eigenvalues and eigenfunctions: 1gaussian mixture models: 1parallel data free: 1parallel processing: 1eigenvoice: 1spoofing attack: 1speaker verification: 1
Most Publications2018: 132021: 112019: 112016: 112022: 9

Affiliations
URLs

TASLP2022 Gaku Kotani, Daisuke Saito, Nobuaki Minematsu, 
Voice Conversion Based on Deep Neural Networks for Time-Variant Linear Transformations.

TASLP2022 Hitoshi Suda, Daisuke Saito, Satoru Fukayama, Tomoyasu Nakano, Masataka Goto, 
Singer Diarization for Polyphonic Music With Unison Singing.

ICASSP2022 Eisuke Konno, Daisuke Saito, Nobuaki Minematsu, 
Quantifying Discriminability between NMF Bases.

Interspeech2022 Takeru Gorai, Daisuke Saito, Nobuaki Minematsu, 
Text-to-speech synthesis using spectral modeling based on non-negative autoencoder.

Interspeech2022 Takuya Kunihara, Chuanbo Zhu 0001, Daisuke Saito, Nobuaki Minematsu, Noriko Nakanishi, 
Detection of Learners' Listening Breakdown with Oral Dictation and Its Use to Model Listening Skill Improvement Exclusively Through Shadowing.

Interspeech2021 Shintaro Ando, Nobuaki Minematsu, Daisuke Saito
Lexical Density Analysis of Word Productions in Japanese English Using Acoustic Word Embeddings.

Interspeech2020 Tatsuma Ishihara, Daisuke Saito
Attention-Based Speaker Embeddings for One-Shot Voice Conversion.

Interspeech2020 Zhenchao Lin, Ryo Takashima, Daisuke Saito, Nobuaki Minematsu, Noriko Nakanishi, 
Shadowability Annotation with Fine Granularity on L2 Utterances and its Improvement with Native Listeners' Script-Shadowing.

Interspeech2020 Yuma Shirahata, Daisuke Saito, Nobuaki Minematsu, 
Discriminative Method to Extract Coarse Prosodic Structure and its Application for Statistical Phrase/Accent Command Estimation.

Interspeech2020 Hitoshi Suda, Gaku Kotani, Daisuke Saito
Nonparallel Training of Exemplar-Based Voice Conversion System Using INCA-Based Alignment Technique.

TASLP2019 Tetsuya Hashimoto, Daisuke Saito, Nobuaki Minematsu, 
Many-to-Many and Completely Parallel-Data-Free Voice Conversion Based on Eigenspace DNN.

Interspeech2019 Tasavat Trisitichoke, Shintaro Ando, Daisuke Saito, Nobuaki Minematsu, 
Analysis of Native Listeners' Facial Microexpressions While Shadowing Non-Native Speech - Potential of Shadowers' Facial Expressions for Comprehensibility Prediction.

Interspeech2018 Yusuke Inoue 0004, Suguru Kabashima, Daisuke Saito, Nobuaki Minematsu, Kumi Kanamura, Yutaka Yamauchi, 
A Study of Objective Measurement of Comprehensibility through Native Speakers' Shadowing of Learners' Utterances.

Interspeech2018 Yasuhito Ohsugi, Daisuke Saito, Nobuaki Minematsu, 
A Comparative Study of Statistical Conversion of Face to Voice Based on Their Subjective Impressions.

Interspeech2017 Tetsuya Hashimoto, Hidetsugu Uchida, Daisuke Saito, Nobuaki Minematsu, 
Parallel-Data-Free Many-to-Many Voice Conversion Based on DNN Integrated with Eigenspace Using a Non-Parallel Speech Corpus.

Interspeech2017 Shohei Toyama, Daisuke Saito, Nobuaki Minematsu, 
Use of Global and Acoustic Features Associated with Contextual Factors to Adapt Language Models for Spontaneous Speech Recognition.

Interspeech2017 Hidetsugu Uchida, Daisuke Saito, Nobuaki Minematsu, 
Acoustic-to-Articulatory Mapping Based on Mixture of Probabilistic Canonical Correlation Analysis.

Interspeech2017 Junwei Yue, Fumiya Shiozawa, Shohei Toyama, Yutaka Yamauchi, Kayoko Ito, Daisuke Saito, Nobuaki Minematsu, 
Automatic Scoring of Shadowing Speech Based on DNN Posteriors and Their DTW.

TASLP2016 Zhizheng Wu 0001, Phillip L. De Leon, Cenk Demiroglu, Ali Khodabakhsh 0001, Simon King, Zhen-Hua Ling, Daisuke Saito, Bryan Stewart, Tomoki Toda, Mirjam Wester, Junichi Yamagishi, 
Anti-Spoofing for Text-Independent Speaker Verification: An Initial Database, Comparison of Countermeasures, and Human Performance.

Interspeech2016 Shuju Shi, Yosuke Kashiwagi, Shohei Toyama, Junwei Yue, Yutaka Yamauchi, Daisuke Saito, Nobuaki Minematsu, 
Automatic Assessment and Error Detection of Shadowing Speech: Case of English Spoken by Japanese Learners.

#190  | Oldrich Plchot | Google Scholar   DBLP
VenuesInterspeech: 15ICASSP: 8TASLP: 1
Years2022: 42020: 42019: 72018: 32017: 22016: 4
ISCA Sectionspeaker recognition: 3the voices from a distance challenge: 2speaker embedding and diarization: 1speaker recognition and diarization: 1robust speaker recognition: 1large-scale evaluation of short-duration speaker verification: 1the first dihard speech diarization challenge: 1dereverberation: 1speaker characterization and recognition: 1speaker recognition evaluation: 1language recognition: 1special session: 1
IEEE Keywordspeaker recognition: 6speaker verification: 3speech enhancement: 2natural language processing: 2bayes methods: 2embeddings: 2entropy: 2gaussian processes: 2dnn: 2multi channel: 1array signal processing: 1beamforming: 1dataset: 1multisv: 1pattern classification: 1topic identification: 1text analysis: 1gaussian distribution: 1unsupervised learning: 1bayesian methods: 1on the fly data augmentation: 1speaker embedding: 1specaugment: 1convolutional neural nets: 1chime: 1speaker diarization: 1inference mechanisms: 1hidden markov models: 1hmm: 1dihard: 1pattern clustering: 1variational bayes: 1discriminative training: 1i vectors: 1i vector extractor: 1domain adaptation: 1neural net architecture: 1lid: 1recurrent neural nets: 1language recognition: 1speech recognition: 1automatic speaker identification: 1i vector: 1mixture models: 1bottleneck features: 1deep neural networks: 1microphones: 1de reverberation: 1denoising: 1
Most Publications2019: 142022: 122018: 122016: 92020: 8

Affiliations
URLs

ICASSP2022 Ladislav Mosner, Oldrich Plchot, Lukás Burget, Jan Honza Cernocký, 
Multisv: Dataset for Far-Field Multi-Channel Speaker Verification.

Interspeech2022 Niko Brummer, Albert Swart, Ladislav Mosner, Anna Silnova, Oldrich Plchot, Themos Stafylakis, Lukás Burget, 
Probabilistic Spherical Discriminant Analysis: An Alternative to PLDA for length-normalized embeddings.

Interspeech2022 Junyi Peng, Rongzhi Gu, Ladislav Mosner, Oldrich Plchot, Lukás Burget, Jan Cernocký, 
Learnable Sparse Filterbank for Speaker Verification.

Interspeech2022 Themos Stafylakis, Ladislav Mosner, Oldrich Plchot, Johan Rohdin, Anna Silnova, Lukás Burget, Jan Cernocký, 
Training speaker embedding extractors using multi-speaker audio with unknown speaker boundaries.

TASLP2020 Santosh Kesiraju, Oldrich Plchot, Lukás Burget, Suryakanth V. Gangashetty, 
Learning Document Embeddings Along With Their Uncertainties.

ICASSP2020 Shuai Wang 0016, Johan Rohdin, Oldrich Plchot, Lukás Burget, Kai Yu 0004, Jan Cernocký, 
Investigation of Specaugment for Deep Speaker Embedding Learning.

ICASSP2020 Federico Landini, Shuai Wang 0016, Mireia Díez, Lukás Burget, Pavel Matejka, Katerina Zmolíková, Ladislav Mosner, Anna Silnova, Oldrich Plchot, Ondrej Novotný, Hossein Zeinali, Johan Rohdin, 
But System for the Second Dihard Speech Diarization Challenge.

Interspeech2020 Alicia Lozano-Diez, Anna Silnova, Bhargav Pulugundla, Johan Rohdin, Karel Veselý, Lukás Burget, Oldrich Plchot, Ondrej Glembek, Ondrej Novotný, Pavel Matejka, 
BUT Text-Dependent Speaker Verification System for SdSV Challenge 2020.

ICASSP2019 Ondrej Novotný, Oldrich Plchot, Ondrej Glembek, Lukás Burget, Pavel Matejka, 
Discriminatively Re-trained I-vector Extractor for Speaker Recognition.

ICASSP2019 Johan Rohdin, Themos Stafylakis, Anna Silnova, Hossein Zeinali, Lukás Burget, Oldrich Plchot
Speaker Verification Using End-to-end Adversarial Language Adaptation.

Interspeech2019 Pavel Matejka, Oldrich Plchot, Hossein Zeinali, Ladislav Mosner, Anna Silnova, Lukás Burget, Ondrej Novotný, Ondrej Glembek, 
Analysis of BUT Submission in Far-Field Scenarios of VOiCES 2019 Challenge.

Interspeech2019 Pavel Matejka, Oldrich Plchot, Hossein Zeinali, Ladislav Mosner, Anna Silnova, Lukás Burget, Ondrej Novotný, Ondrej Glembek, 
Analysis of BUT Submission in Far-Field Scenarios of VOiCES 2019 Challenge.

Interspeech2019 Ondrej Novotný, Oldrich Plchot, Ondrej Glembek, Lukás Burget, 
Factorization of Discriminatively Trained i-Vector Extractor for Speaker Recognition.

Interspeech2019 Themos Stafylakis, Johan Rohdin, Oldrich Plchot, Petr Mizera, Lukás Burget, 
Self-Supervised Speaker Embeddings.

Interspeech2019 Shuai Wang 0016, Johan Rohdin, Lukás Burget, Oldrich Plchot, Yanmin Qian, Kai Yu 0004, Jan Cernocký, 
On the Usage of Phonetic Information for Text-Independent Speaker Embedding Extraction.

ICASSP2018 Alicia Lozano-Diez, Oldrich Plchot, Pavel Matejka, Joaquin Gonzalez-Rodriguez, 
DNN Based Embeddings for Language Recognition.

Interspeech2018 Mireia Díez, Federico Landini, Lukás Burget, Johan Rohdin, Anna Silnova, Katerina Zmolíková, Ondrej Novotný, Karel Veselý, Ondrej Glembek, Oldrich Plchot, Ladislav Mosner, Pavel Matejka, 
BUT System for DIHARD Speech Diarization Challenge 2018.

Interspeech2018 Ladislav Mosner, Oldrich Plchot, Pavel Matejka, Ondrej Novotný, Jan Cernocký, 
Dereverberation and Beamforming in Robust Far-Field Speaker Recognition.

Interspeech2017 Pavel Matejka, Ondrej Novotný, Oldrich Plchot, Lukás Burget, Mireia Díez Sánchez, Jan Cernocký, 
Analysis of Score Normalization in Multilingual Speaker Recognition.

Interspeech2017 Oldrich Plchot, Pavel Matejka, Anna Silnova, Ondrej Novotný, Mireia Díez Sánchez, Johan Rohdin, Ondrej Glembek, Niko Brümmer, Albert Swart, Jesús Jorrín-Prieto, Paola García, Luis Buera, Patrick Kenny, Md. Jahangir Alam, Gautam Bhattacharya, 
Analysis and Description of ABC Submission to NIST SRE 2016.

#191  | Liang Lu 0001 | Google Scholar   DBLP
VenuesICASSP: 12Interspeech: 12
Years2022: 22021: 62020: 62019: 22018: 12017: 32016: 4
ISCA Sectionasr neural network architectures: 2streaming for asr/rnn transducers: 1multi- and cross-lingual asr, other topics in asr: 1neural network training methods for asr: 1asr model training and strategies: 1streaming asr: 1search for speech recognition: 1neural network acoustic models for asr: 1discriminative training for asr: 1new trends in neural networks for speech recognition: 1neural networks in speech recognition: 1
IEEE Keywordspeech recognition: 9recurrent neural nets: 6speaker recognition: 3natural language processing: 3feedforward neural nets: 3multi talker asr: 2probability: 2streaming: 1end to end end point detection: 1dual path rnn: 1transducer: 1long form meeting transcription: 1speaker counting: 1bayes methods: 1minimum bayes risk training: 1speech separation: 1speaker identification: 1recurrent neural network transducer: 1attention based encoder decoder: 1language model: 1regularization: 1sequence training: 1deep neural network: 1self teaching: 1continuous speech separation: 1libricss: 1microphones: 1overlapped speech: 1automatic speech recognition: 1permutation invariant training: 1audio signal processing: 1streaming attention based sequence to sequence asr: 1decoding: 1pattern classification: 1encoding: 1latency reduction: 1monotonic chunkwise attention: 1senone classification: 1future context frames: 1lstm: 1signal classification: 1layer trajectory: 1temporal modeling: 1lexicon free recognition: 1long short term memory: 1convolution: 1conversational speech recognition: 1connectionist temporal classification: 1convolutional neural networks: 1small footprint: 1highway deep neural networks: 1gaussian processes: 1mixture models: 1knowledge distillation: 1speech coding: 1encoder decoder: 1recurrent neural networks: 1hidden markov models: 1deep neural networks: 1end to end speech recognition: 1signal representation: 1i vector: 1lstm rnns: 1speaker adaptation: 1speaking rate: 1speaker aware training: 1
Most Publications2020: 152021: 122017: 112016: 102019: 6

Affiliations
Microsoft, Bellevue, WA, USA
Toyota Technological Institute at Chicago, Chicago, IL, USA (former)
University of Edinburgh, Centre for Speech Technology Research, UK (former)

ICASSP2022 Liang Lu 0001, Jinyu Li 0001, Yifan Gong 0001, 
Endpoint Detection for Streaming End-to-End Multi-Talker ASR.

ICASSP2022 Desh Raj, Liang Lu 0001, Zhuo Chen 0006, Yashesh Gaur, Jinyu Li 0001, 
Continuous Streaming Multi-Talker ASR with Dual-Path Transducers.

ICASSP2021 Naoyuki Kanda, Zhong Meng, Liang Lu 0001, Yashesh Gaur, Xiaofei Wang 0009, Zhuo Chen 0006, Takuya Yoshioka, 
Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR.

ICASSP2021 Zhong Meng, Naoyuki Kanda, Yashesh Gaur, Sarangarajan Parthasarathy, Eric Sun, Liang Lu 0001, Xie Chen 0001, Jinyu Li 0001, Yifan Gong 0001, 
Internal Language Model Training for Domain-Adaptive End-To-End Speech Recognition.

ICASSP2021 Eric Sun, Liang Lu 0001, Zhong Meng, Yifan Gong 0001, 
Sequence-Level Self-Teaching Regularization.

Interspeech2021 Liang Lu 0001, Naoyuki Kanda, Jinyu Li 0001, Yifan Gong 0001, 
Streaming Multi-Talker Speech Recognition with Joint Speaker Identification.

Interspeech2021 Liang Lu 0001, Zhong Meng, Naoyuki Kanda, Jinyu Li 0001, Yifan Gong 0001, 
On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer.

Interspeech2021 Zhong Meng, Yu Wu 0012, Naoyuki Kanda, Liang Lu 0001, Xie Chen 0001, Guoli Ye, Eric Sun, Jinyu Li 0001, Yifan Gong 0001, 
Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition.

ICASSP2020 Zhuo Chen 0006, Takuya Yoshioka, Liang Lu 0001, Tianyan Zhou, Zhong Meng, Yi Luo 0004, Jian Wu 0027, Xiong Xiao, Jinyu Li 0001, 
Continuous Speech Separation: Dataset and Analysis.

ICASSP2020 Hirofumi Inaguma, Yashesh Gaur, Liang Lu 0001, Jinyu Li 0001, Yifan Gong 0001, 
Minimum Latency Training Strategies for Streaming Sequence-to-Sequence ASR.

Interspeech2020 Chengyi Wang 0002, Yu Wu 0012, Yujiao Du, Jinyu Li 0001, Shujie Liu 0001, Liang Lu 0001, Shuo Ren, Guoli Ye, Sheng Zhao, Ming Zhou 0001, 
Semantic Mask for Transformer Based End-to-End Speech Recognition.

Interspeech2020 Chengyi Wang 0002, Yu Wu 0012, Liang Lu 0001, Shujie Liu 0001, Jinyu Li 0001, Guoli Ye, Ming Zhou 0001, 
Low Latency End-to-End Streaming Speech Recognition with a Scout Network.

Interspeech2020 Liang Lu 0001, Changliang Liu, Jinyu Li 0001, Yifan Gong 0001, 
Exploring Transformers for Large-Scale Speech Recognition.

Interspeech2020 Jeremy H. M. Wong, Yashesh Gaur, Rui Zhao 0017, Liang Lu 0001, Eric Sun, Jinyu Li 0001, Yifan Gong 0001, 
Combination of End-to-End and Hybrid Models for Speech Recognition.

ICASSP2019 Jinyu Li 0001, Liang Lu 0001, Changliang Liu, Yifan Gong 0001, 
Improving Layer Trajectory LSTM with Future Context Frames.

Interspeech2019 Liang Lu 0001, Eric Sun, Yifan Gong 0001, 
Self-Teaching Networks.

ICASSP2018 Kalpesh Krishna, Liang Lu 0001, Kevin Gimpel, Karen Livescu, 
A Study of All-Convolutional Encoders for Connectionist Temporal Classification.

ICASSP2017 Liang Lu 0001, Michelle Guo, Steve Renals, 
Knowledge distillation for small-footprint highway networks.

Interspeech2017 Liang Lu 0001, Lingpeng Kong, Chris Dyer, Noah A. Smith, 
Multitask Learning with CTC and Segmental CRF for Speech Recognition.

Interspeech2017 Shubham Toshniwal, Hao Tang 0002, Liang Lu 0001, Karen Livescu, 
Multitask Learning with Low-Level Auxiliary Tasks for Encoder-Decoder Based Speech Recognition.

#192  | Tim Fingscheidt | Google Scholar   DBLP
VenuesICASSP: 11Interspeech: 7TASLP: 6
Years2022: 22021: 42020: 52019: 22018: 52017: 22016: 4
ISCA Sectionneural network training methods and architectures for asr: 1interspeech 2021 acoustic echo cancellation challenge: 1interspeech 2021 deep noise suppression challenge: 1asr neural network architectures: 1deep noise suppression challenge: 1speaker state and trait: 1dereverberation, echo cancellation and speech: 1
IEEE Keywordspeech enhancement: 10speech coding: 4convolutional neural nets: 3artificial speech bandwidth extension: 3spectral analysis: 3mean square error methods: 2deep learning (artificial intelligence): 2noise reduction: 2filtering theory: 2echo suppression: 2recurrent neural nets: 2hidden markov models: 2estimation theory: 2social signal processing: 2speech recognition: 2pesq: 1non intrusive pesq estimation: 1deep noise suppression: 1convolutional recurrent neural network: 1adaptive kalman filters: 1residual echo suppression: 1frequency domain analysis: 1fully convolutional recurrent network: 1adaptive filters: 1loudspeakers: 1convolutional neural network: 1convlstm: 1acoustic echo cancellation: 1sed: 1acoustic event detection: 1sound event detection: 1rare event detection: 1crnn: 1aed: 1dcase 2017 challenge: 1knowledge based systems: 1expectation maximisation algorithm: 1convolutional lstm: 1convolutional recurrent neural networks: 1components loss function: 1time frequency analysis: 1mask based speech enhancement: 1cnn: 1sinusoidal: 1speech intelligibility: 1lowband: 1speech synthesis: 1convolutional codes: 1adaptive codes: 1cepstral analysis: 1cmos integrated circuits: 1speech codecs: 1time domain analysis: 1convolutional neural networks: 1least mean squares methods: 1 ${a\; priori}$ snr: 1regression analysis: 1machine learning: 1deep neural network: 1regression: 1microphones: 1crosstalk: 1multichannel speaker activity detection: 1speaker recognition: 1meeting: 1voice activity detection: 1matrix decomposition: 1a priori snr: 1discriminative non negative matrix factorization: 1support vector machines: 1objective speech quality assessment: 1perceptual model: 1audio visual systems: 1iterative decoding: 1multimedia systems: 1robustness: 1turbo codes: 1viterbi decoding: 1acr: 1artificial bandwidth extension: 1natural language processing: 1listening test: 1correlation methods: 1bayesian model selection: 1instrumental speech quality measures: 1bayes methods: 1emotion recognition: 1linear discriminant analysis: 1ambiguous labels: 1ground truth: 1
Most Publications2021: 322020: 312022: 252019: 202018: 18

Affiliations
URLs

TASLP2022 Ziyi Xu, Maximilian Strake, Tim Fingscheidt
Deep Noise Suppression Maximizing Non-Differentiable PESQ Mediated by a Non-Intrusive PESQNet.

ICASSP2022 Jan Franzen, Tim Fingscheidt
Deep Residual Echo Suppression and Noise Reduction: A Multi-Input FCRN Approach in a Hybrid Speech Enhancement System.

ICASSP2021 Jan Franzen, Ernst Seidel, Tim Fingscheidt
AEC in A Netshell: on Target and Topology Choices for FCRN Acoustic Echo Cancellation.

Interspeech2021 Timo Lohrenz, Zhengyang Li, Tim Fingscheidt
Multi-Encoder Learning and Stream Fusion for Transformer-Based End-to-End Automatic Speech Recognition.

Interspeech2021 Ernst Seidel, Jan Franzen, Maximilian Strake, Tim Fingscheidt
Y2-Net FCRN for Acoustic Echo and Noise Suppression.

Interspeech2021 Ziyi Xu, Maximilian Strake, Tim Fingscheidt
Deep Noise Suppression with Non-Intrusive PESQNet Supervision Enabling the Use of Real Training Data.

ICASSP2020 Jan Baumann, Timo Lohrenz, Alexander Roy, Tim Fingscheidt
Beyond the Dcase 2017 Challenge on Rare Sound Event Detection: A Proposal for a More Realistic Training and Test Framework.

ICASSP2020 Maximilian Strake, Bruno Defraene, Kristoff Fluyt, Wouter Tirry, Tim Fingscheidt
Fully Convolutional Recurrent Networks for Speech Enhancement.

ICASSP2020 Ziyi Xu, Samy Elshamy, Tim Fingscheidt
Using Separate Losses for Speech and Noise in Mask-Based Speech Enhancement.

Interspeech2020 Timo Lohrenz, Tim Fingscheidt
BLSTM-Driven Stream Fusion for Automatic Speech Recognition: Novel Methods and a Multi-Size Window Fusion Example.

Interspeech2020 Maximilian Strake, Bruno Defraene, Kristoff Fluyt, Wouter Tirry, Tim Fingscheidt
INTERSPEECH 2020 Deep Noise Suppression Challenge: A Fully Convolutional Recurrent Network (FCRN) for Joint Dereverberation and Denoising.

TASLP2019 Johannes Abel, Tim Fingscheidt
Sinusoidal-Based Lowband Synthesis for Artificial Speech Bandwidth Extension.

TASLP2019 Ziyue Zhao, Huijun Liu 0001, Tim Fingscheidt
Convolutional Neural Networks to Enhance Coded Speech.

TASLP2018 Samy Elshamy, Nilesh Madhu, Wouter Tirry, Tim Fingscheidt
DNN-Supported Speech Enhancement With Cepstral Estimation of Both Excitation and Envelope.

ICASSP2018 Johannes Abel, Maximilian Strake, Tim Fingscheidt
A Simple Cepstral Domain DNN Approach to Artificial Speech Bandwidth Extension.

ICASSP2018 Patrick Meyer, Rolf Jongebloed, Tim Fingscheidt
Multichannel Speaker Activity Detection for Meetings.

ICASSP2018 Ziyi Xu, Samy Elshamy, Tim Fingscheidt
A Priori SNR Estimation Using Discriminative Non-Negative Matrix Factorization.

Interspeech2018 Patrick Meyer, Eric Buschermöhle, Tim Fingscheidt
What Do Classifiers Actually Learn? a Case Study on Emotion Recognition Datasets.

TASLP2017 Johannes Abel, Magdalena Kaniewska, Cyril Guillaume, Wouter Tirry, Tim Fingscheidt
An Instrumental Quality Measure for Artificially Bandwidth-Extended Speech Signals.

Interspeech2017 Jan Franzen, Tim Fingscheidt
A Delay-Flexible Stereo Acoustic Echo Cancellation for DFT-Based In-Car Communication (ICC) Systems.

#193  | Xurong Xie | Google Scholar   DBLP
VenuesInterspeech: 15TASLP: 5ICASSP: 4
Years2022: 52021: 92020: 22019: 42018: 22017: 12016: 1
ISCA Sectionspeech recognition of atypical speech: 4multi-, cross-lingual and other topics in asr: 2topics in asr: 2novel models and training methods for asr: 1asr neural network architectures: 1model adaptation for asr: 1novel neural network architectures for acoustic modelling: 1application of asr in medical practice: 1acoustic model adaptation: 1speech synthesis: 1
IEEE Keywordspeech recognition: 6bayes methods: 6speaker recognition: 5speaker adaptation: 3bayesian learning: 3natural language processing: 3disordered speech recognition: 2handicapped aids: 2time delay neural network: 2deep learning (artificial intelligence): 2neural architecture search: 2inference mechanisms: 2lhuc: 2gaussian processes: 2elderly speech recognition: 1neural net architecture: 1search problems: 1minimisation: 1uncertainty handling: 1domain adaptation: 1variational inference: 1gaussian process: 1lf mmi: 1delays: 1generalisation (artificial intelligence): 1data augmentation: 1multimodal speech recognition: 1tdnn: 1adaptation: 1switchboard: 1elderly speech: 1automatic speech recognition: 1neurocognitive disorder detection: 1dementia: 1activation function selection: 1gaussian process neural network: 1bayesian neural network: 1maximum likelihood estimation: 1hidden markov models: 1
Most Publications2022: 212021: 112023: 62020: 42019: 4

Affiliations
URLs

TASLP2022 Mengzhe Geng, Xurong Xie, Zi Ye, Tianzi Wang, Guinan Li, Shujie Hu, Xunying Liu, Helen Meng, 
Speaker Adaptation Using Spectro-Temporal Deep Features for Dysarthric and Elderly Speech Recognition.

TASLP2022 Shoukang Hu, Xurong Xie, Mingyu Cui, Jiajun Deng, Shansong Liu, Jianwei Yu, Mengzhe Geng, Xunying Liu, Helen Meng, 
Neural Architecture Search for LF-MMI Trained Time Delay Neural Networks.

Interspeech2022 Mingyu Cui, Jiajun Deng, Shoukang Hu, Xurong Xie, Tianzi Wang, Shujie Hu, Mengzhe Geng, Boyang Xue, Xunying Liu, Helen Meng, 
Two-pass Decoding and Cross-adaptation Based System Combination of End-to-end Conformer and Hybrid TDNN ASR Systems.

Interspeech2022 Jiajun Deng, Xurong Xie, Tianzi Wang, Mingyu Cui, Boyang Xue, Zengrui Jin, Mengzhe Geng, Guinan Li, Xunying Liu, Helen Meng, 
Confidence Score Based Conformer Speaker Adaptation for Speech Recognition.

Interspeech2022 Jin Li, Rongfeng Su, Xurong Xie, Lan Wang, Nan Yan, 
A Multi-level Acoustic Feature Extraction Framework for Transformer Based End-to-End Speech Recognition.

TASLP2021 Shoukang Hu, Xurong Xie, Shansong Liu, Jianwei Yu, Zi Ye, Mengzhe Geng, Xunying Liu, Helen Meng, 
Bayesian Learning of LF-MMI Trained Time Delay Neural Networks for Speech Recognition.

TASLP2021 Shansong Liu, Mengzhe Geng, Shoukang Hu, Xurong Xie, Mingyu Cui, Jianwei Yu, Xunying Liu, Helen Meng, 
Recent Progress in the CUHK Dysarthric Speech Recognition System.

TASLP2021 Xurong Xie, Xunying Liu, Tan Lee, Lan Wang, 
Bayesian Learning for Deep Neural Network Adaptation.

ICASSP2021 Shoukang Hu, Xurong Xie, Shansong Liu, Mingyu Cui, Mengzhe Geng, Xunying Liu, Helen Meng, 
Neural Architecture Search for LF-MMI Trained Time Delay Neural Networks.

ICASSP2021 Zi Ye, Shoukang Hu, Jinchao Li, Xurong Xie, Mengzhe Geng, Jianwei Yu, Junhao Xu, Boyang Xue, Shansong Liu, Xunying Liu, Helen Meng, 
Development of the Cuhk Elderly Speech Recognition System for Neurocognitive Disorder Detection Using the Dementiabank Corpus.

Interspeech2021 Jiajun Deng, Fabian Ritter Gutierrez, Shoukang Hu, Mengzhe Geng, Xurong Xie, Zi Ye, Shansong Liu, Jianwei Yu, Xunying Liu, Helen Meng, 
Bayesian Parametric and Architectural Domain Adaptation of LF-MMI Trained TDNNs for Elderly and Dysarthric Speech Recognition.

Interspeech2021 Mengzhe Geng, Shansong Liu, Jianwei Yu, Xurong Xie, Shoukang Hu, Zi Ye, Zengrui Jin, Xunying Liu, Helen Meng, 
Spectro-Temporal Deep Features for Disordered Speech Assessment and Recognition.

Interspeech2021 Zengrui Jin, Mengzhe Geng, Xurong Xie, Jianwei Yu, Shansong Liu, Xunying Liu, Helen Meng, 
Adversarial Data Augmentation for Disordered Speech Recognition.

Interspeech2021 Xurong Xie, Rukiye Ruzi, Xunying Liu, Lan Wang, 
Variational Auto-Encoder Based Variability Encoding for Dysarthric Speech Recognition.

Interspeech2020 Mengzhe Geng, Xurong Xie, Shansong Liu, Jianwei Yu, Shoukang Hu, Xunying Liu, Helen Meng, 
Investigation of Data Augmentation Techniques for Disordered Speech Recognition.

Interspeech2020 Shansong Liu, Xurong Xie, Jianwei Yu, Shoukang Hu, Mengzhe Geng, Rongfeng Su, Shi-Xiong Zhang, Xunying Liu, Helen Meng, 
Exploiting Cross-Domain Visual Feature Generation for Disordered Speech Recognition.

ICASSP2019 Shoukang Hu, Max W. Y. Lam, Xurong Xie, Shansong Liu, Jianwei Yu, Xixin Wu, Xunying Liu, Helen Meng, 
Bayesian and Gaussian Process Neural Networks for Large Vocabulary Continuous Speech Recognition.

ICASSP2019 Xurong Xie, Xunying Liu, Tan Lee, Shoukang Hu, Lan Wang, 
BLHUC: Bayesian Learning of Hidden Unit Contributions for Deep Neural Network Speaker Adaptation.

Interspeech2019 Shoukang Hu, Xurong Xie, Shansong Liu, Max W. Y. Lam, Jianwei Yu, Xixin Wu, Xunying Liu, Helen Meng, 
LF-MMI Training of Bayesian and Gaussian Process Time Delay Neural Networks for Speech Recognition.

Interspeech2019 Xurong Xie, Xunying Liu, Tan Lee, Lan Wang, 
Fast DNN Acoustic Model Speaker Adaptation by Learning Hidden Unit Contribution Features.

#194  | Yi-Chiao Wu | Google Scholar   DBLP
VenuesInterspeech: 15ICASSP: 6TASLP: 3
Years2022: 22021: 72020: 52019: 42018: 22017: 32016: 1
ISCA Sectionspeech synthesis: 5neural techniques for voice conversion and waveform generation: 2voice conversion: 2voice conversion and adaptation: 1the zero resource speech challenge 2020: 1neural waveform generation: 1voice conversion and speech synthesis: 1speech-enhancement: 1special session: 1
IEEE Keywordvoice conversion: 4speaker recognition: 3speech synthesis: 3vocoders: 3signal denoising: 2recurrent neural nets: 2pitch dependent dilated convolution: 2autoregressive processes: 2convolutional neural nets: 2neural vocoder: 2speech coding: 2noisy to noisy vc: 1voice conversion (vc): 1natural language processing: 1noisy speech modeling: 1sequence to sequence: 1transformer: 1pretraining: 1speech recognition: 1parallel wavegan: 1quasi periodic wavenet: 1pitch controllability: 1wavenet: 1vocoder: 1audio signal processing: 1quasi periodic structure: 1any to one voice conversion: 1signal representation: 1self supervised speech representation: 1sequence to sequence modeling: 1vq wav2vec: 1open source software: 1gaussian processes: 1vector quantized variational autoencoder: 1nonparallel: 1prediction theory: 1shallow model: 1laplacian distribution: 1wavenet vocoder: 1multiple samples output: 1linear prediction: 1oversmoothed parameters: 1wavenet fine tuning: 1cyclic recurrent neural network: 1filtering theory: 1deep neural network: 1postfiltering: 1speech enhancement: 1locally linear embedding: 1spectral analysis: 1
Most Publications2021: 162020: 162019: 122018: 92022: 8

Affiliations
URLs

ICASSP2022 Chao Xie, Yi-Chiao Wu, Patrick Lumban Tobing, Wen-Chin Huang, Tomoki Toda, 
Direct Noisy Speech Modeling for Noisy-To-Noisy Voice Conversion.

Interspeech2022 Reo Yoneyama, Yi-Chiao Wu, Tomoki Toda, 
Unified Source-Filter GAN with Harmonic-plus-Noise Source Excitation Generation.

TASLP2021 Wen-Chin Huang, Tomoki Hayashi, Yi-Chiao Wu, Hirokazu Kameoka, Tomoki Toda, 
Pretraining Techniques for Sequence-to-Sequence Voice Conversion.

TASLP2021 Yi-Chiao Wu, Tomoki Hayashi, Takuma Okamoto, Hisashi Kawai, Tomoki Toda, 
Quasi-Periodic Parallel WaveGAN: A Non-Autoregressive Raw Waveform Generative Model With Pitch-Dependent Dilated Convolution Neural Network.

TASLP2021 Yi-Chiao Wu, Tomoki Hayashi, Patrick Lumban Tobing, Kazuhiro Kobayashi, Tomoki Toda, 
Quasi-Periodic WaveNet: An Autoregressive Raw Waveform Generative Model With Pitch-Dependent Dilated Convolution Neural Network.

ICASSP2021 Wen-Chin Huang, Yi-Chiao Wu, Tomoki Hayashi, 
Any-to-One Sequence-to-Sequence Voice Conversion Using Self-Supervised Discrete Speech Representations.

ICASSP2021 Kazuhiro Kobayashi, Wen-Chin Huang, Yi-Chiao Wu, Patrick Lumban Tobing, Tomoki Hayashi, Tomoki Toda, 
Crank: An Open-Source Software for Nonparallel Voice Conversion Based on Vector-Quantized Variational Autoencoder.

Interspeech2021 Yi-Chiao Wu, Cheng-Hung Hu, Hung-Shin Lee, Yu-Huai Peng, Wen-Chin Huang, Yu Tsao 0001, Hsin-Min Wang, Tomoki Toda, 
Relational Data Selection for Data Augmentation of Speaker-Dependent Multi-Band MelGAN Vocoder.

Interspeech2021 Reo Yoneyama, Yi-Chiao Wu, Tomoki Toda, 
Unified Source-Filter GAN: Unified Source-Filter Network Based On Factorization of Quasi-Periodic Parallel WaveGAN.

ICASSP2020 Patrick Lumban Tobing, Yi-Chiao Wu, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda, 
Efficient Shallow Wavenet Vocoder Using Multiple Samples Output Based on Laplacian Distribution and Linear Prediction.

Interspeech2020 Wen-Chin Huang, Tomoki Hayashi, Yi-Chiao Wu, Hirokazu Kameoka, Tomoki Toda, 
Voice Transformer Network: Sequence-to-Sequence Voice Conversion Using Transformer with Text-to-Speech Pretraining.

Interspeech2020 Patrick Lumban Tobing, Tomoki Hayashi, Yi-Chiao Wu, Kazuhiro Kobayashi, Tomoki Toda, 
Cyclic Spectral Modeling for Unsupervised Unit Discovery into Voice Conversion with Excitation and Waveform Modeling.

Interspeech2020 Yi-Chiao Wu, Tomoki Hayashi, Takuma Okamoto, Hisashi Kawai, Tomoki Toda, 
Quasi-Periodic Parallel WaveGAN Vocoder: A Non-Autoregressive Pitch-Dependent Dilated Convolution Model for Parametric Speech Generation.

Interspeech2020 Yi-Chiao Wu, Patrick Lumban Tobing, Kazuki Yasuhara, Noriyuki Matsunaga, Yamato Ohtani, Tomoki Toda, 
A Cyclical Post-Filtering Approach to Mismatch Refinement of Neural Vocoder for Text-to-Speech Systems.

ICASSP2019 Patrick Lumban Tobing, Yi-Chiao Wu, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda, 
Voice Conversion with Cyclic Recurrent Neural Network and Fine-tuned Wavenet Vocoder.

Interspeech2019 Wen-Chin Huang, Yi-Chiao Wu, Chen-Chou Lo, Patrick Lumban Tobing, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda, Yu Tsao 0001, Hsin-Min Wang, 
Investigation of F0 Conditioning and Fully Convolutional Networks in Variational Autoencoder Based Voice Conversion.

Interspeech2019 Patrick Lumban Tobing, Yi-Chiao Wu, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda, 
Non-Parallel Voice Conversion with Cyclic Variational Autoencoder.

Interspeech2019 Yi-Chiao Wu, Tomoki Hayashi, Patrick Lumban Tobing, Kazuhiro Kobayashi, Tomoki Toda, 
Quasi-Periodic WaveNet Vocoder: A Pitch Dependent Dilated Convolution Model for Parametric Speech Generation.

Interspeech2018 Yu-Huai Peng, Hsin-Te Hwang, Yi-Chiao Wu, Yu Tsao 0001, Hsin-Min Wang, 
Exemplar-Based Spectral Detail Compensation for Voice Conversion.

Interspeech2018 Yi-Chiao Wu, Kazuhiro Kobayashi, Tomoki Hayashi, Patrick Lumban Tobing, Tomoki Toda, 
Collapsed Speech Segment Detection and Suppression for WaveNet Vocoder.

#195  | Shiyin Kang | Google Scholar   DBLP
VenuesInterspeech: 12ICASSP: 11TASLP: 1
Years2022: 52021: 42020: 52019: 52018: 32016: 2
ISCA Sectionspeech synthesis: 5voice conversion and adaptation: 2non-autoregressive sequential modeling for speech processing: 1speech synthesis paradigms and methods: 1speech-to-text and speech assessment: 1neural techniques for voice conversion and waveform generation: 1expressive speech synthesis: 1
IEEE Keywordspeech synthesis: 8natural language processing: 5speech recognition: 5recurrent neural nets: 5speech coding: 4speaker recognition: 4expressive speech synthesis: 2voice conversion: 2vocoders: 2text analysis: 1hierarchical: 1xlnet: 1knowledge distillation: 1speaking style modelling: 1decoding: 1hybrid bottleneck features: 1voice activity detection: 1disentangling: 1entropy: 1cross entropy: 1connectionist temporal classification: 1emotion recognition: 1residual error: 1capsule: 1speech emotion recognition: 1exemplary emotion descriptor: 1multi speaker and multi style tts: 1durian: 1hifi gan: 1low resource condition: 1speech intelligibility: 1phonetic pos teriorgrams: 1code switching: 1accented speech recognition: 1accent conversion: 1audio visual systems: 1overlapped speech: 1speech separation: 1audio visual speech recognition: 1multi modal: 1optimisation: 1self attention: 1wavenet: 1blstm: 1phonetic posteriorgrams(ppgs): 1quasifully recurrent neural network (qrnn): 1convolutional neural nets: 1variational inference: 1text to speech (tts) synthesis: 1parallel wavenet: 1convolutional neural network (cnn): 1parallel processing: 1style adaptation: 1regression analysis: 1expressiveness: 1speaking style: 1style feature: 1probability: 1importance sampling: 1vocabulary: 1recurrent neural networks: 1automatic speech recognition: 1language modeling: 1multilingual: 1cross lingual: 1low resource: 1bidirectional long short term memory (blstm): 1
Most Publications2022: 112019: 92020: 82021: 72018: 4

Affiliations
URLs

ICASSP2022 Shun Lei, Yixuan Zhou, Liyang Chen, Zhiyong Wu 0001, Shiyin Kang, Helen Meng, 
Towards Expressive Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis.

ICASSP2022 Xintao Zhao, Feng Liu, Changhe Song, Zhiyong Wu 0001, Shiyin Kang, Deyi Tuo, Helen Meng, 
Disentangling Content and Fine-Grained Prosody Information Via Hybrid ASR Bottleneck Features for Voice Conversion.

Interspeech2022 Jie Chen, Changhe Song, Deyi Tuo, Xixin Wu, Shiyin Kang, Zhiyong Wu 0001, Helen Meng, 
Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information.

Interspeech2022 Shun Lei, Yixuan Zhou, Liyang Chen, Jiankun Hu, Zhiyong Wu 0001, Shiyin Kang, Helen Meng, 
Towards Multi-Scale Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis.

Interspeech2022 Shaohuan Zhou, Shun Lei, Weiya You, Deyi Tuo, Yuren You, Zhiyong Wu 0001, Shiyin Kang, Helen Meng, 
Towards Improving the Expressiveness of Singing Voice Synthesis with BERT Derived Semantic Information.

TASLP2021 Xixin Wu, Yuewen Cao, Hui Lu, Songxiang Liu, Shiyin Kang, Zhiyong Wu 0001, Xunying Liu, Helen Meng, 
Exemplar-Based Emotive Speech Synthesis.

ICASSP2021 Jie Wang, Yuren You, Feng Liu, Deyi Tuo, Shiyin Kang, Zhiyong Wu 0001, Helen Meng, 
The Huya Multi-Speaker and Multi-Style Speech Synthesis System for M2voc Challenge 2020.

Interspeech2021 Hui Lu, Zhiyong Wu 0001, Xixin Wu, Xu Li, Shiyin Kang, Xunying Liu, Helen Meng, 
VAENAR-TTS: Variational Auto-Encoder Based Non-AutoRegressive Text-to-Speech Synthesis.

Interspeech2021 Jie Wang, Jingbei Li, Xintao Zhao, Zhiyong Wu 0001, Shiyin Kang, Helen Meng, 
Adversarially Learning Disentangled Speech Representations for Robust Multi-Factor Voice Conversion.

ICASSP2020 Yuewen Cao, Songxiang Liu, Xixin Wu, Shiyin Kang, Peng Liu, Zhiyong Wu 0001, Xunying Liu, Dan Su 0002, Dong Yu 0001, Helen Meng, 
Code-Switched Speech Synthesis Using Bilingual Phonetic Posteriorgram with Only Monolingual Corpora.

ICASSP2020 Songxiang Liu, Disong Wang, Yuewen Cao, Lifa Sun, Xixin Wu, Shiyin Kang, Zhiyong Wu 0001, Xunying Liu, Dan Su 0002, Dong Yu 0001, Helen Meng, 
End-To-End Accent Conversion Without Using Native Utterances.

ICASSP2020 Jianwei Yu, Shi-Xiong Zhang, Jian Wu 0027, Shahram Ghorbani, Bo Wu, Shiyin Kang, Shansong Liu, Xunying Liu, Helen Meng, Dong Yu 0001, 
Audio-Visual Recognition of Overlapped Speech for the LRS2 Dataset.

Interspeech2020 Songxiang Liu, Yuewen Cao, Shiyin Kang, Na Hu, Xunying Liu, Dan Su 0002, Dong Yu 0001, Helen Meng, 
Transferring Source Style in Non-Parallel Voice Conversion.

Interspeech2020 Chengzhu Yu, Heng Lu 0004, Na Hu, Meng Yu 0003, Chao Weng, Kun Xu 0005, Peng Liu, Deyi Tuo, Shiyin Kang, Guangzhi Lei, Dan Su 0002, Dong Yu 0001, 
DurIAN: Duration Informed Attention Network for Speech Synthesis.

ICASSP2019 Hui Lu, Zhiyong Wu 0001, Runnan Li, Shiyin Kang, Jia Jia 0001, Helen Meng, 
A Compact Framework for Voice Conversion Using Wavenet Conditioned on Phonetic Posteriorgrams.

ICASSP2019 Mu Wang, Xixin Wu, Zhiyong Wu 0001, Shiyin Kang, Deyi Tuo, Guangzhi Li, Dan Su 0002, Dong Yu 0001, Helen Meng, 
Quasi-fully Convolutional Neural Network with Variational Inference for Speech Synthesis.

Interspeech2019 Dongyang Dai, Zhiyong Wu 0001, Shiyin Kang, Xixin Wu, Jia Jia 0001, Dan Su 0002, Dong Yu 0001, Helen Meng, 
Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-Trained BERT.

Interspeech2019 Shen Huang, Bojie Hu, Shan Huang, Pengfei Hu, Jian Kang 0006, Zhiqiang Lv, Jinghao Yan, Qi Ju, Shiyin Kang, Deyi Tuo, Guangzhi Li, Nurmemet Yolwas, 
Multimedia Simultaneous Translation System for Minority Language Communication with Mandarin.

Interspeech2019 Hui Lu, Zhiyong Wu 0001, Dongyang Dai, Runnan Li, Shiyin Kang, Jia Jia 0001, Helen Meng, 
One-Shot Voice Conversion with Global Speaker Embeddings.

ICASSP2018 Xixin Wu, Lifa Sun, Shiyin Kang, Songxiang Liu, Zhiyong Wu 0001, Xunying Liu, Helen Meng, 
Feature Based Adaptation for Speaking Style Synthesis.

#196  | Zejun Ma | Google Scholar   DBLP
VenuesICASSP: 12Interspeech: 10AAAI: 1KDD: 1
Years2022: 132021: 92020: 2
ISCA Sectionspeech synthesis: 2speech segmentation: 1applications in transcription, education and learning: 1neural transducers, streaming asr and novel asr models: 1asr: 1speech enhancement and intelligibility: 1neural network training methods for asr: 1voice activity detection and keyword spotting: 1streaming for asr/rnn transducers: 1
IEEE Keywordnatural language processing: 6speech recognition: 5audio signal processing: 4music: 3speech synthesis: 3text analysis: 3representation learning: 2speech coding: 2token semantic module: 1sound event detection: 1convolutional neural nets: 1graphics processing units: 1transformer: 1signal classification: 1semantic networks: 1audio classification: 1contextual biasing: 1collaborative decoding: 1contextual speech recognition: 1knowledge selection: 1semi supervised learning (artificial intelligence): 1pseudo labeling: 1signal representation: 1end to end model: 1semi supervised learning: 1unsupervised learning: 1multilingual: 1language adaptation: 1multi channel multi speaker speech recognition: 1alimeeting: 1speaker diarization: 1m2met: 1data augmentation: 1speaker recognition: 1direction of arrival estimation: 1reverberation: 1pattern classification: 1bayes methods: 1melody extraction: 1pose estimation: 1audio recording: 1high resolution network (hrnet): 1pitch refinement: 1approximation theory: 1melody midi: 1cross modal learning: 1audio visual systems: 1rule embedding: 1multi modal fusion: 1audio visual voice detection: 1voice activity detection: 1video signal processing: 1video streaming: 1confusion module: 1hidden markov models: 1phonetic posteriorgrams: 1singing voice conversion: 1emotion recognition: 1audiobook: 1text to speech: 1speaker determination: 1emotion classifi cation: 1keyword spotting: 1recurrent neural nets: 1ctc: 1transfer learning: 1entropy: 1multi task: 1feedforward neural nets: 1rnn t: 1text to speech front end: 1sequence to sequence: 1pipelines: 1semi auto regressive: 1joint modeling: 1linguistics: 1mandarin: 1multi head self attention: 1text normalization: 1knowledge based systems: 1imbalanced dataset: 1
Most Publications2022: 422023: 222021: 202020: 102019: 3

Affiliations
URLs

ICASSP2022 Ke Chen 0021, Xingjian Du, Bilei Zhu, Zejun Ma, Taylor Berg-Kirkpatrick, Shlomo Dubnov, 
HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection.

ICASSP2022 Minglun Han, Linhao Dong, Zhenlin Liang, Meng Cai, Shiyu Zhou, Zejun Ma, Bo Xu 0002, 
Improving End-to-End Contextual Speech Recognition with Fine-Grained Contextual Knowledge Selection.

ICASSP2022 Shaoshi Ling, Chen Shen 0011, Meng Cai, Zejun Ma
Improving Pseudo-Label Training For End-To-End Speech Recognition Using Gradient Mask.

ICASSP2022 Yizhou Lu, Mingkun Huang, Xinghua Qu, Pengfei Wei, Zejun Ma
Language Adaptive Cross-Lingual Speech Representation Learning with Sparse Sharing Sub-Networks.

ICASSP2022 Chen Shen 0011, Yi Liu, Wenzhi Fan, Bin Wang, Shixue Wen, Yao Tian, Jun Zhang, Jingsheng Yang, Zejun Ma
The Volcspeech System for the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Challenge.

Interspeech2022 Zhiyun Fan, Zhenlin Liang, Linhao Dong, Yi Liu, Shiyu Zhou, Meng Cai, Jun Zhang, Zejun Ma, Bo Xu, 
Token-level Speaker Change Detection Using Speaker Difference and Speech Content via Continuous Integrate-and-fire.

Interspeech2022 Kaiqi Fu, Shaojun Gao, Xiaohai Tian, Wei Li 0012, Zejun Ma
Using Fluency Representation Learned from Sequential Raw Features for Improving Non-native Fluency Scoring.

Interspeech2022 Junfeng Hou, Jinkun Chen, Wanyu Li, Yufeng Tang, Jun Zhang, Zejun Ma
Bring dialogue-context into RNN-T for streaming ASR.

Interspeech2022 Yufei Liu, Rao Ma, Haihua Xu, Yi He, Zejun Ma, Weibin Zhang, 
Internal Language Model Estimation Through Explicit Context Vector Learning for Attention-based Encoder-decoder ASR.

Interspeech2022 Xiaohai Tian, Kaiqi Fu, Shaojun Gao, Yiwei Gu, Kai Wang, Wei Li, Zejun Ma
A Transfer and Multi-Task Learning based Approach for MOS Prediction.

Interspeech2022 Chao Wang, Zhonghao Li, Benlai Tang, Xiang Yin 0006, Yuan Wan, Yibiao Yu, Zejun Ma
Towards high-fidelity singing voice conversion with acoustic reference and contrastive predictive coding.

AAAI2022 Ke Chen 0021, Xingjian Du, Bilei Zhu, Zejun Ma, Taylor Berg-Kirkpatrick, Shlomo Dubnov, 
Zero-Shot Audio Source Separation through Query-Based Learning from Weakly-Labeled Data.

KDD2022 Xinghua Qu, Pengfei Wei, Mingyong Gao, Zhu Sun, Yew Soon Ong, Zejun Ma
Synthesising Audio Adversarial Examples for Automatic Speech Recognition.

ICASSP2021 Yongwei Gao, Xingjian Du, Bilei Zhu, Xiaoheng Sun, Wei Li 0012, Zejun Ma
An Hrnet-Blstm Model With Two-Stage Training For Singing Melody Extraction.

ICASSP2021 Yuanbo Hou, Yi Deng, Bilei Zhu, Zejun Ma, Dick Botteldooren, 
Rule-Embedded Network for Audio-Visual Voice Activity Detection in Live Musical Video Streams.

ICASSP2021 Zhonghao Li, Benlai Tang, Xiang Yin 0006, Yuan Wan, Ling Xu, Chen Shen 0011, Zejun Ma
PPG-Based Singing Voice Conversion with Adversarial Representation Learning.

ICASSP2021 Junjie Pan, Lin Wu, Xiang Yin 0006, Pengfei Wu, Chenchang Xu, Zejun Ma
A Chapter-Wise Understanding System for Text-To-Speech in Chinese Novels.

ICASSP2021 Yao Tian, Haitao Yao, Meng Cai, Yaming Liu, Zejun Ma
Improving RNN Transducer Modeling for Small-Footprint Keyword Spotting.

Interspeech2021 Xianzhao Chen, Hao Ni, Yi He, Kang Wang, Zejun Ma, Zongxia Xie, 
Emitting Word Timings with HMM-Free End-to-End System in Automatic Speech Recognition.

Interspeech2021 Yuanbo Hou, Zhesong Yu, Xia Liang, Xingjian Du, Bilei Zhu, Zejun Ma, Dick Botteldooren, 
Attention-Based Cross-Modal Fusion for Audio-Visual Voice Activity Detection in Musical Video Streams.

#197  | Gabriel Synnaeve | Google Scholar   DBLP
VenuesInterspeech: 13ICASSP: 10ICML: 1
Years2022: 22021: 52020: 92019: 32018: 22016: 3
ISCA Sectionself-supervision and semi-supervision for neural asr training: 2topics in asr: 2neural networks for language modeling: 2single-channel speech enhancement: 1multilingual and code-switched asr: 1computational resource constrained speech recognition: 1asr model training and strategies: 1sequence models for asr: 1topics in speech recognition: 1automatic learning of representations: 1
IEEE Keywordspeech recognition: 9natural language processing: 6pseudo labeling: 2channel bank filters: 2supervised learning: 1massively multilingual models: 1semi supervised learning: 1entropy: 1pattern classification: 1self supervision: 1semi supervised: 1contrastive learning: 1joint training: 1self training: 1pre training: 1self supervised learning: 1zero and low resource asr.: 1dataset: 1text analysis: 1audio signal processing: 1unsupervised learning: 1unsupervised and semi supervised learning: 1distant supervision: 1transformer: 1video signal processing: 1ctc: 1hybrid asr: 1adversarial learning: 1multi task learning: 1speaker recognition: 1error statistics: 1automatic speech recognition: 1open source software: 1end to end: 1public domain software: 1c++ language: 1time domain analysis: 1approximation theory: 1transient response: 1siamese network: 1abx: 1scattering transform: 1abnet: 1
Most Publications2020: 272021: 262019: 172022: 162018: 11

Affiliations
URLs

ICASSP2022 Loren Lugosch, Tatiana Likhomanenko, Gabriel Synnaeve, Ronan Collobert, 
Pseudo-Labeling for Massively Multilingual Speech Recognition.

ICASSP2022 Vineel Pratap, Qiantong Xu, Tatiana Likhomanenko, Gabriel Synnaeve, Ronan Collobert, 
Word Order does not Matter for Speech Recognition.

ICASSP2021 Chaitanya Talnikar, Tatiana Likhomanenko, Ronan Collobert, Gabriel Synnaeve
Joint Masked CPC And CTC Training For ASR.

ICASSP2021 Qiantong Xu, Alexei Baevski, Tatiana Likhomanenko, Paden Tomasello, Alexis Conneau, Ronan Collobert, Gabriel Synnaeve, Michael Auli, 
Self-Training and Pre-Training are Complementary for Speech Recognition.

Interspeech2021 Wei-Ning Hsu, Anuroop Sriram, Alexei Baevski, Tatiana Likhomanenko, Qiantong Xu, Vineel Pratap, Jacob Kahn, Ann Lee 0001, Ronan Collobert, Gabriel Synnaeve, Michael Auli, 
Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training.

Interspeech2021 Tatiana Likhomanenko, Qiantong Xu, Jacob Kahn, Gabriel Synnaeve, Ronan Collobert, 
slimIPL: Language-Model-Free Iterative Pseudo-Labeling.

Interspeech2021 Tatiana Likhomanenko, Qiantong Xu, Vineel Pratap, Paden Tomasello, Jacob Kahn, Gilad Avidov, Ronan Collobert, Gabriel Synnaeve
Rethinking Evaluation in ASR: Are Our Models Robust Enough?

ICASSP2020 Jacob Kahn, Morgane Rivière, Weiyi Zheng, Evgeny Kharitonov, Qiantong Xu, Pierre-Emmanuel Mazaré, Julien Karadayi, Vitaliy Liptchinsky, Ronan Collobert, Christian Fuegen, Tatiana Likhomanenko, Gabriel Synnaeve, Armand Joulin, Abdelrahman Mohamed, Emmanuel Dupoux, 
Libri-Light: A Benchmark for ASR with Limited or No Supervision.

ICASSP2020 Andros Tjandra, Chunxi Liu, Frank Zhang 0001, Xiaohui Zhang, Yongqiang Wang 0005, Gabriel Synnaeve, Satoshi Nakamura 0001, Geoffrey Zweig, 
DEJA-VU: Double Feature Presentation and Iterated Loss in Deep Transformer Networks.

Interspeech2020 Alexandre Défossez, Gabriel Synnaeve, Yossi Adi, 
Real Time Speech Enhancement in the Waveform Domain.

Interspeech2020 Da-Rong Liu, Chunxi Liu, Frank Zhang 0001, Gabriel Synnaeve, Yatharth Saraf, Geoffrey Zweig, 
Contextualizing ASR Lattice Rescoring with Hybrid Pointer Network Language Model.

Interspeech2020 Vineel Pratap, Anuroop Sriram, Paden Tomasello, Awni Y. Hannun, Vitaliy Liptchinsky, Gabriel Synnaeve, Ronan Collobert, 
Massively Multilingual ASR: 50 Languages, 1 Model, 1 Billion Parameters.

Interspeech2020 Vineel Pratap, Qiantong Xu, Jacob Kahn, Gilad Avidov, Tatiana Likhomanenko, Awni Y. Hannun, Vitaliy Liptchinsky, Gabriel Synnaeve, Ronan Collobert, 
Scaling Up Online Speech Recognition Using ConvNets.

Interspeech2020 Vineel Pratap, Qiantong Xu, Anuroop Sriram, Gabriel Synnaeve, Ronan Collobert, 
MLS: A Large-Scale Multilingual Dataset for Speech Research.

Interspeech2020 Qiantong Xu, Tatiana Likhomanenko, Jacob Kahn, Awni Y. Hannun, Gabriel Synnaeve, Ronan Collobert, 
Iterative Pseudo-Labeling for Speech Recognition.

ICML2020 Ronan Collobert, Awni Y. Hannun, Gabriel Synnaeve
Word-Level Speech Recognition With a Letter to Word Encoder.

ICASSP2019 Yossi Adi, Neil Zeghidour, Ronan Collobert, Nicolas Usunier, Vitaliy Liptchinsky, Gabriel Synnaeve
To Reverse the Gradient or Not: an Empirical Comparison of Adversarial and Multi-task Learning in Speech Recognition.

ICASSP2019 Vineel Pratap, Awni Y. Hannun, Qiantong Xu, Jeff Cai, Jacob Kahn, Gabriel Synnaeve, Vitaliy Liptchinsky, Ronan Collobert, 
Wav2Letter++: A Fast Open-source Speech Recognition System.

Interspeech2019 Tatiana Likhomanenko, Gabriel Synnaeve, Ronan Collobert, 
Who Needs Words? Lexicon-Free Speech Recognition.

ICASSP2018 Neil Zeghidour, Nicolas Usunier, Iasonas Kokkinos, Thomas Schatz, Gabriel Synnaeve, Emmanuel Dupoux, 
Learning Filterbanks from Raw Speech for Phone Recognition.

#198  | Chung-Cheng Chiu | Google Scholar   DBLP
VenuesICASSP: 11Interspeech: 11ICML: 1ICLR: 1
Years2022: 22021: 72020: 72019: 22018: 52017: 1
ISCA Sectionstreaming for asr/rnn transducers: 2asr neural network architectures: 2non-autoregressive sequential modeling for speech processing: 1asr neural network architectures and training: 1streaming asr: 1training strategies for asr: 1asr neural network training: 1application of asr in medical practice: 1end-to-end speech recognition: 1
IEEE Keywordspeech recognition: 9recurrent neural nets: 5natural language processing: 3speech coding: 2conformer: 2optimisation: 2probability: 2end to end speech recognition: 2vocabulary: 2decoding: 2gradient methods: 2rnnt: 1two pass asr: 1long form asr: 1speaker recognition: 1end to end asr: 1non streaming asr: 1model distillation: 1streaming asr: 1latency: 1cascaded encoders: 1rnn t: 1regression analysis: 1multi domain training: 1data augmentation: 1supervised learning: 1text analysis: 1variational inference: 1sequence to sequence: 1end to end: 1monte carlo methods: 1gaussian distribution: 1online: 1las: 1language translation: 1decision theory: 1stochastic processes: 1automatic speech recognition: 1very deep convolutional neural networks: 1
Most Publications2020: 212021: 172019: 152018: 122017: 10

Affiliations
URLs

ICASSP2022 Tara N. Sainath, Yanzhang He, Arun Narayanan, Rami Botros, Weiran Wang, David Qiu, Chung-Cheng Chiu, Rohit Prabhavalkar, Alexander Gruenstein, Anmol Gulati, Bo Li 0028, David Rybach, Emmanuel Guzman, Ian McGraw, James Qin, Krzysztof Choromanski, Qiao Liang, Robert David, Ruoming Pang, Shuo-Yiin Chang, Trevor Strohman, W. Ronny Huang, Wei Han 0002, Yonghui Wu, Yu Zhang 0033, 
Improving The Latency And Quality Of Cascaded Encoders.

ICML2022 Chung-Cheng Chiu, James Qin, Yu Zhang, Jiahui Yu, Yonghui Wu, 
Self-supervised learning with random-projection quantizer for speech recognition.

ICASSP2021 Thibault Doutre, Wei Han 0002, Min Ma, Zhiyun Lu, Chung-Cheng Chiu, Ruoming Pang, Arun Narayanan, Ananya Misra, Yu Zhang 0033, Liangliang Cao, 
Improving Streaming Automatic Speech Recognition with Non-Streaming Model Distillation on Unsupervised Data.

ICASSP2021 Bo Li 0028, Anmol Gulati, Jiahui Yu, Tara N. Sainath, Chung-Cheng Chiu, Arun Narayanan, Shuo-Yiin Chang, Ruoming Pang, Yanzhang He, James Qin, Wei Han 0002, Qiao Liang, Yu Zhang 0033, Trevor Strohman, Yonghui Wu, 
A Better and Faster end-to-end Model for Streaming ASR.

ICASSP2021 Jiahui Yu, Chung-Cheng Chiu, Bo Li 0028, Shuo-Yiin Chang, Tara N. Sainath, Yanzhang He, Arun Narayanan, Wei Han 0002, Anmol Gulati, Yonghui Wu, Ruoming Pang, 
FastEmit: Low-Latency Streaming ASR with Sequence-Level Emission Regularization.

Interspeech2021 Thibault Doutre, Wei Han 0002, Chung-Cheng Chiu, Ruoming Pang, Olivier Siohan, Liangliang Cao, 
Bridging the Gap Between Streaming and Non-Streaming ASR Systems by Distilling Ensembles of CTC and RNN-T Models.

Interspeech2021 Edwin G. Ng, Chung-Cheng Chiu, Yu Zhang, William Chan, 
Pushing the Limits of Non-Autoregressive Speech Recognition.

Interspeech2021 Tara N. Sainath, Yanzhang He, Arun Narayanan, Rami Botros, Ruoming Pang, David Rybach, Cyril Allauzen, Ehsan Variani, James Qin, Quoc-Nam Le-The, Shuo-Yiin Chang, Bo Li 0028, Anmol Gulati, Jiahui Yu, Chung-Cheng Chiu, Diamantino Caseiro, Wei Li 0133, Qiao Liang, Pat Rondon, 
An Efficient Streaming Non-Recurrent On-Device End-to-End Model with Improvements to Rare-Word Modeling.

ICLR2021 Jiahui Yu, Wei Han 0002, Anmol Gulati, Chung-Cheng Chiu, Bo Li 0028, Tara N. Sainath, Yonghui Wu, Ruoming Pang, 
Dual-mode ASR: Unify and Improve Streaming ASR with Full-context Modeling.

ICASSP2020 Daniel S. Park, Yu Zhang 0033, Chung-Cheng Chiu, Youzheng Chen, Bo Li 0028, William Chan, Quoc V. Le, Yonghui Wu, 
Specaugment on Large Scale Datasets.

ICASSP2020 Tara N. Sainath, Yanzhang He, Bo Li 0028, Arun Narayanan, Ruoming Pang, Antoine Bruguier, Shuo-Yiin Chang, Wei Li 0133, Raziel Alvarez, Zhifeng Chen, Chung-Cheng Chiu, David Garcia, Alexander Gruenstein, Ke Hu, Anjuli Kannan, Qiao Liang, Ian McGraw, Cal Peyser, Rohit Prabhavalkar, Golan Pundak, David Rybach, Yuan Shangguan, Yash Sheth, Trevor Strohman, Mirkó Visontai, Yonghui Wu, Yu Zhang 0033, Ding Zhao, 
A Streaming On-Device End-To-End Model Surpassing Server-Side Conventional Model Quality and Latency.

ICASSP2020 Tara N. Sainath, Ruoming Pang, Ron J. Weiss, Yanzhang He, Chung-Cheng Chiu, Trevor Strohman, 
An Attention-Based Joint Acoustic and Text on-Device End-To-End Model.

Interspeech2020 Anmol Gulati, James Qin, Chung-Cheng Chiu, Niki Parmar, Yu Zhang 0033, Jiahui Yu, Wei Han 0002, Shibo Wang, Zhengdong Zhang, Yonghui Wu, Ruoming Pang, 
Conformer: Convolution-augmented Transformer for Speech Recognition.

Interspeech2020 Wei Han 0002, Zhengdong Zhang, Yu Zhang 0033, Jiahui Yu, Chung-Cheng Chiu, James Qin, Anmol Gulati, Ruoming Pang, Yonghui Wu, 
ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context.

Interspeech2020 Wei Li 0133, James Qin, Chung-Cheng Chiu, Ruoming Pang, Yanzhang He, 
Parallel Rescoring with Transformer for Streaming On-Device Speech Recognition.

Interspeech2020 Daniel S. Park, Yu Zhang 0033, Ye Jia, Wei Han 0002, Chung-Cheng Chiu, Bo Li 0028, Yonghui Wu, Quoc V. Le, 
Improved Noisy Student Training for Automatic Speech Recognition.

Interspeech2019 Daniel S. Park, William Chan, Yu Zhang 0033, Chung-Cheng Chiu, Barret Zoph, Ekin D. Cubuk, Quoc V. Le, 
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition.

Interspeech2019 Tara N. Sainath, Ruoming Pang, David Rybach, Yanzhang He, Rohit Prabhavalkar, Wei Li 0133, Mirkó Visontai, Qiao Liang, Trevor Strohman, Yonghui Wu, Ian McGraw, Chung-Cheng Chiu
Two-Pass End-to-End Speech Recognition.

ICASSP2018 Chung-Cheng Chiu, Tara N. Sainath, Yonghui Wu, Rohit Prabhavalkar, Patrick Nguyen, Zhifeng Chen, Anjuli Kannan, Ron J. Weiss, Kanishka Rao, Ekaterina Gonina, Navdeep Jaitly, Bo Li 0028, Jan Chorowski, Michiel Bacchiani, 
State-of-the-Art Speech Recognition with Sequence-to-Sequence Models.

ICASSP2018 Dieterich Lawson, Chung-Cheng Chiu, George Tucker, Colin Raffel, Kevin Swersky, Navdeep Jaitly, 
Learning Hard Alignments with Variational Inference.

#199  | Mikko Kurimo | Google Scholar   DBLP
VenuesInterspeech: 18ICASSP: 3SpeechComm: 1TASLP: 1
Years2022: 42021: 12020: 62019: 22018: 12017: 62016: 3
ISCA Sectionneural networks for language modeling: 2show & tell: 2pathological speech assessment: 1neural network training methods for asr: 1low-resource asr development: 1miscellanous topics in asr: 1automatic speech recognition for non-native children’s speech: 1speech, language, and multimodal resources: 1language recognition: 1language learning and databases: 1corpus annotation and evaluation: 1show and tell: 1multimodal resources and annotation: 1lexical and pronunciation modeling: 1new products and services: 1show & tell session: 1
IEEE Keywordspeech recognition: 4language modeling: 2hidden markov models: 1children speech recognition: 1formant modification: 1dnn: 1speaker embedding: 1speaker adaptation: 1speaker aware training: 1end to end speech recognition: 1subword units: 1artificial neural networks: 1natural language processing: 1word classes: 1automatic speech recognition: 1latent dirichlet allocation: 1recurrent neural network: 1recurrent neural nets: 1document handling: 1
Most Publications2020: 222022: 142017: 142013: 142016: 12

Affiliations
Aalto University, Finland

SpeechComm2022 Hemant Kumar Kathania, Sudarsana Reddy Kadiri, Paavo Alku, Mikko Kurimo
A formant modification method for improved ASR of children's speech.

Interspeech2022 Yaroslav Getman, Ragheb Al-Ghezi, Katja Voskoboinik, Tamás Grósz, Mikko Kurimo, Giampiero Salvi, Torbjørn Svendsen, Sofia Strömbergsson, 
wav2vec2-based Speech Rating System for Children with Speech Sound Disorder.

Interspeech2022 Georgios Karakasidis, Tamás Grósz, Mikko Kurimo
Comparison and Analysis of New Curriculum Criteria for End-to-End ASR.

Interspeech2022 Aku Rouhe, Anja Virkkunen, Juho Leinonen 0002, Mikko Kurimo
Low Resource Comparison of Attention-based and Hybrid ASR Exploiting wav2vec 2.0.

Interspeech2021 Ragheb Al-Ghezi, Yaroslav Getman, Aku Rouhe, Raili Hildén, Mikko Kurimo
Self-Supervised End-to-End ASR for Low Resource L2 Swedish.

ICASSP2020 Hemant Kumar Kathania, Sudarsana Reddy Kadiri, Paavo Alku, Mikko Kurimo
Study of Formant Modification for Children ASR.

ICASSP2020 Aku Rouhe, Tuomas Kaseva, Mikko Kurimo
Speaker-Aware Training of Attention-Based End-to-End Speech Recognition Using Neural Speaker Embeddings.

Interspeech2020 Abhilash Jain, Aku Rouhe, Stig-Arne Grönroos, Mikko Kurimo
Finnish ASR with Deep Transformer Models.

Interspeech2020 Hemant Kumar Kathania, Mittul Singh, Tamás Grósz, Mikko Kurimo
Data Augmentation Using Prosody and False Starts to Recognize Non-Native Children's Speech.

Interspeech2020 Katri Leino, Juho Leinonen 0002, Mittul Singh, Sami Virpioja, Mikko Kurimo
FinChat: Corpus and Evaluation Setup for Finnish Chat Conversations on Everyday Topics.

Interspeech2020 Matias Lindgren, Tommi Jauhiainen, Mikko Kurimo
Releasing a Toolkit and Comparing the Performance of Language Embeddings Across Various Spoken Language Identification Datasets.

Interspeech2019 Reima Karhila, Anna-Riikka Smolander, Sari Ylinen, Mikko Kurimo
Transparent Pronunciation Scoring Using Articulatorily Weighted Phoneme Edit Distance.

Interspeech2019 Mittul Singh, Sami Virpioja, Peter Smit, Mikko Kurimo
Subword RNNLM Approximations for Out-Of-Vocabulary Keyword Search.

Interspeech2018 Aku Rouhe, Reima Karhila, Aija Elg, Minnaleena Toivola, Peter Smit, Anna-Riikka Smolander, Mikko Kurimo
Captaina: Integrated Pronunciation Practice and Data Collection Portal.

TASLP2017 Seppo Enarvi, Peter Smit, Sami Virpioja, Mikko Kurimo
Automatic Speech Recognition With Very Large Conversational Finnish and Estonian Vocabularies.

ICASSP2017 Md. Akmal Haidar, Mikko Kurimo
LDA-based context dependent recurrent neural network language model using document-based topic distribution of words.

Interspeech2017 Reima Karhila, Sari Ylinen, Seppo Enarvi, Kalle J. Palomäki, Aleksander Nikulin, Olli Rantula, Vertti Viitanen, Krupakar Dhinakaran, Anna-Riikka Smolander, Heini Kallio, Katja Junttila, Maria Uther, Perttu Hämäläinen, Mikko Kurimo
SIAK - A Game for Foreign Language Pronunciation Learning.

Interspeech2017 André Mansikkaniemi, Peter Smit, Mikko Kurimo
Automatic Construction of the Finnish Parliament Speech Corpus.

Interspeech2017 Aku Rouhe, Reima Karhila, Peter Smit, Mikko Kurimo
Reading Validation for Pronunciation Evaluation in the Digitala Project.

Interspeech2017 Peter Smit, Sami Virpioja, Mikko Kurimo
Improved Subword Modeling for WFST-Based Speech Recognition.

#200  | Bhiksha Raj | Google Scholar   DBLP
VenuesInterspeech: 12ICASSP: 7EMNLP: 1TASLP: 1NeurIPS: 1ICLR: 1
Years2022: 42021: 52020: 22019: 52018: 32017: 32016: 1
ISCA Sectionprivacy and security in speech communication: 2(multimodal) speech emotion recognition: 1novel models and training methods for asr: 1single-channel and multi-channel speech enhancement: 1acoustic event detection and acoustic scene classification: 1speaker recognition: 1speech synthesis: 1application of asr in medical practice: 1source separation and voice activity detection: 1speech and audio segmentation and classification: 1speech enhancement and applications: 1
IEEE Keywordspeaker recognition: 4natural language processing: 2audio signal processing: 2convolutional neural nets: 2acoustic signal detection: 2text analysis: 2speech synthesis: 2pattern classification: 1adversarial attacks: 1high frequency: 1filtering theory: 1robustness: 1audio classification: 1speech: 1steganography: 1adversarial examples: 1discrete cosine transforms: 1speaker identification: 1signal representation: 1sound event detection: 1pattern recognition: 1weak labels: 1confidence intervals: 1signal classification: 1jackknife estimates: 1audio databases: 1query processing: 1cross modal retrieval: 1siamese neural network: 1content based audio retrieval: 1search engines: 1joint audio text embedding: 1meta data: 1audio search engine: 1query by example: 1channel state information: 1wifi sensing: 1activity recognition: 1speech recognition: 1voice impersonation: 1style transfer: 1style transformation: 1generative adversarial network: 1i vectors: 1speaker verification: 1deep corrective learning networks: 1universal background model: 1recurrent neural nets: 1pattern matching: 1sound concepts: 1grammars: 1acoustic relations: 1audio events and scenes: 1sound and language: 1
Most Publications2022: 312016: 292012: 272021: 252011: 23

Affiliations
Carnegie Mellon University, Pittsburgh, USA

Interspeech2022 Hira Dhamyal, Bhiksha Raj, Rita Singh, 
Positional Encoding for Capturing Modality Specific Cadence for Emotion Detection.

Interspeech2022 Raphaël Olivier, Bhiksha Raj
Recent improvements of ASR models in the face of adversarial attacks.

Interspeech2022 Francisco Teixeira, Alberto Abad, Bhiksha Raj, Isabel Trancoso, 
Towards End-to-End Private Automatic Speaker Recognition.

Interspeech2022 Muqiao Yang, Joseph Konan, David Bick, Anurag Kumar 0003, Shinji Watanabe 0001, Bhiksha Raj
Improving Speech Enhancement through Fine-Grained Speech Characteristics.

ICASSP2021 Raphaël Olivier, Bhiksha Raj, Muhammad Shah, 
High-Frequency Adversarial Defense for Speech and Audio.

ICASSP2021 Ali Shahin Shamsabadi, Francisco Sepúlveda Teixeira, Alberto Abad, Bhiksha Raj, Andrea Cavallaro, Isabel Trancoso, 
FoolHD: Fooling Speaker Identification by Highly Imperceptible Adversarial Disturbances.

Interspeech2021 Soham Deshmukh, Bhiksha Raj, Rita Singh, 
Improving Weakly Supervised Sound Event Detection with Self-Supervised Auxiliary Tasks.

Interspeech2021 Jiachen Lian, Aiswarya Vinod Kumar, Hira Dhamyal, Bhiksha Raj, Rita Singh, 
Masked Proxy Loss for Text-Independent Speaker Verification.

EMNLP2021 Raphaël Olivier, Bhiksha Raj
Sequential Randomized Smoothing for Adversarially Robust Speech Recognition.

Interspeech2020 Hira Dhamyal, Shahan Ali Memon, Bhiksha Raj, Rita Singh, 
The Phonetic Bases of Vocal Expressed Emotion: Natural versus Acted.

Interspeech2020 Felix Kreuk, Yossi Adi, Bhiksha Raj, Rita Singh, Joseph Keshet, 
Hide and Speak: Towards Deep Neural Networks for Speech Steganography.

TASLP2019 Annamaria Mesaros, Aleksandr Diment, Benjamin Elizalde, Toni Heittola, Emmanuel Vincent 0001, Bhiksha Raj, Tuomas Virtanen, 
Sound Event Detection in the DCASE 2017 Challenge.

ICASSP2019 Benjamin Elizalde, Shuayb Zarar, Bhiksha Raj
Cross Modal Audio Search and Retrieval with Joint Embeddings Based on Text and Audio.

ICASSP2019 Daanish Ali Khan, Saquib Razak, Bhiksha Raj, Rita Singh, 
Human Behaviour Recognition Using Wifi Channel State Information.

NeurIPS2019 Yandong Wen, Bhiksha Raj, Rita Singh, 
Face Reconstruction from Voice using Generative Adversarial Networks.

ICLR2019 Yandong Wen, Mahmoud Al Ismail, Weiyang Liu, Bhiksha Raj, Rita Singh, 
Disjoint Mapping Network for Cross-modal Matching of Voices and Faces.

ICASSP2018 Yang Gao, Rita Singh, Bhiksha Raj
Voice Impersonation Using Generative Adversarial Networks.

ICASSP2018 Yandong Wen, Tianyan Zhou, Rita Singh, Bhiksha Raj
A Corrective Learning Approach for Text-Independent Speaker Verification.

Interspeech2018 M. Joana Correia, Bhiksha Raj, Isabel Trancoso, Francisco Teixeira, 
Mining Multimodal Repositories for Speech Affecting Diseases.

ICASSP2017 Anurag Kumar 0003, Bhiksha Raj, Ndapandula Nakashole, 
Discovering sound concepts and acoustic relations in text.