技術探索

室內長距離語音辨識技術挑戰與初探

中文摘要

長距離語音辨識受到收音裝置、室內空間響應、語者說話位置與方位、以及環境噪音等因素影響,本文針對各個因素進行解析,並嘗試提出解決之方法以及進行初步的實驗。實驗結果顯示長距離影響了如鼻音與塞擦音等子音語音訊號,使得該類型語音之辨識與驗證正確率大幅下降。在加入長距離語音語料進行調適後,可提升語音辨識正確率約10%。而以深度神經網路為基礎之語音模型在加入長距離語料後,更可以得到約60%的音節辨識正確增加率。

Abstract

Distant speech recognition accuracy is highly correlated with types of recording devices, room acoustics, speakers’ location and orientation, and environmental noises. This article analyzed causes which decrease distant speech recognition accuracy and tried to propose possible solutions with preliminary experiments. The results showed that the recognition and verification accuracy of consonants, like nasal and affricate, decreased significantly as distance increased. After model adaptation using our distant speech corpus, the recognition accuracy was improved by 10%. There was even about 60% accuracy improvement rate when we used deep neural network as acoustic models trained with the distant speech corpus.

關鍵詞(Key Words)

自動語音辨識 (Automatic Speech Recognition;ASR)
深度神經網路 (Deep Neural Network;DNN)
語音人機介面(Voice User Interface; VUI)

相關檔案: 室內長距離語音辨識技術挑戰與初探(全文)