Mutian He

PhD Student
Idiap Research Institute
École Polytechnique Fédérale de Lausanne (EPFL)

Email: mutian.he at idiap dot ch

Github / Google Scholar / CV

I am a PhD candidate at the Idiap Research Institute, EPFL, Switzerland, advised by Phil Garner, working on spoken language understanding by combining both speech and NLP techniques. Before that, I received my B.E. degree from Beihang University (BUAA) in 2019, and my MPhil degree from the Hong Kong University of Science and Technology in 2022 with thesis on conceptualization in commonsense reasoning, advised by Yangqiu Song. I have also worked on the topic of speech synthesis at Microsoft, focused on robustness, multilinguality, and low-resource condition.

I'm interested in a broad range of topics on machine learning side of speech and language processing, including pretraining, modelling, and generation.


  • The Interpreter Understands Your Meaning: End-to-end Spoken Language Understanding Aided by Speech Translation
    Findings of EMNLP-2023 [Paper] [Code & Data]
    Mutian He, Philip N. Garner
  • Can ChatGPT Detect Intent? Evaluating Large Language Models for Spoken Language Understanding
    Interspeech-2023 [Paper] [Resources]
    Mutian He, Philip N. Garner
  • Acquiring and Modelling Abstract Commonsense Knowledge via Conceptualization
    [Paper] [Code]
    Mutian He, Tianqing Fang, Weiqi Wang, Yangqiu Song
  • Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge
    Interspeech-2022 [Paper] [Demo] [Code]
    Mutian He, Jingzhou Yang, Lei He, Frank K. Soong
  • Multilingual Byte2Speech Models for Scalable Low-resource Speech Synthesis
    [Paper] [Demo] [Code]
    Mutian He, Jingzhou Yang, Lei He, Frank K. Soong
  • On the Role of Conceptualization in Commonsense Knowledge Graph Construction
    [Paper] [Code]
    Mutian He, Yangqiu Song, Kun Xu, Dong Yu
  • Neural Subgraph Isomorphism Counting
    KDD-2020 [Paper]
    Xin Liu, Haojie Pan, Mutian He, Yangqiu Song, Xin Jiang
  • Robust Sequence-to-Sequence Acoustic Modeling with Stepwise Monotonic Attention for Neural TTS
    Interspeech-2019 [Paper] [Demo] [Code]
    Mutian He, Yan Deng, Lei He
  • Time-evolving Text Data Classification with Deep Neural Networks
    IJCAI-2018 [Paper]
    Yu He, Jianxin Li, Yangqiu Song, Mutian He, Hao Peng


  • Intro to Natural Language Processing, HKUST, Spring 2020
  • Intro to Speech Processing, Idiap, Fall 2022


CSRankings is a powerful tool for identifying active researchers in various fields of computer science, but the area of speech is not covered. Inspired by the idea, I created a similar Speech Rankings when I was looking for potential PhD advisors.
Plain Academic