The paper is here. The pipeline and pretrained models using open datasets are available here.
‣ Below are audio samples used in the MOS tests.
Ground Truth | Baseline, 18K data | Baseline, 10K data | NLR, 18K data | NLR, 10K data |
人均寿命可达七十岁左右。 | ||||
两个月后,洗衣机干脆不能用了。 | ||||
山地好恬静,只有秋虫此起彼伏的轻声吟唱。 | ||||
本报讯:与缅甸、老挝、越南三国接壤的云南省思茅地区,禁毒任务繁重。 | ||||
Even for a trained model, the pronunciation of characters can be manupulated by lexicon texts, and new knowledge can be introduced. Below is an example, using the script "矿工从巷道中走出":
Manipulated lexicon text | Changes | Audio |
矿 ● kuàng (1)矿物,蕴藏在地层中的自然物质:矿藏(cáng)。铁矿。煤矿。矿产。矿泉。矿源。 (2)开采矿物的场所:矿井。矿坑。下矿。 巷 ● xiàng 胡同,里弄:小巷。陋巷。穷巷。巷陌(街道)。巷战(在城市街巷里进行的战斗)。穷街陋巷。 ● hàng (1)〔巷道〕采矿或探矿时挖的坑道。(2)义同(一)。 |
(Original) |
(kuàng gōng cóng hàng dào zhōng zǒu chū) |
释义:矿 ● wáng (1)矿物,蕴藏在地层中的自然物质:矿藏(cáng)。铁矿。煤矿。矿产。矿泉。矿源。 (2)开采矿物的场所:矿井。矿坑。下矿。 巷 ● xiàng 胡同,里弄:小巷。陋巷。穷巷。巷陌(街道)。巷战(在城市街巷里进行的战斗)。穷街陋巷。 |
"巷" is a heteronym that should be pronounced in a special way (hàng) instead of the most common one (xiàng). In this sample we removed the text describing the pronunciation hàng, and changed the pronunciation of "矿". |
(wáng gōng cóng xiàng dào zhōng zǒu chū) |
释义:矿 ● kuàng (1)矿物,蕴藏在地层中的自然物质:矿藏(cáng)。铁矿。煤矿。矿产。矿泉。矿源。 (2)开采矿物的场所:矿井。矿坑。下矿。● tuó〔矿工〕矿山工人;尤指采矿的工人 | In this sample we add an additional (fake) pronunciation to the character "矿", making it a heteronym, with a reading tuó that matches the context. In this way, for low-resource languages with incomplete lexicons, the pronunciation knowledge can be easily added after the model is trained. |
(tuó gōng cóng hàng dào zhōng zǒu chū) |