Refine
Year of publication
- 2022 (1)
Document Type
- Master's Thesis (1)
Language
- English (1)
Has Fulltext
- yes (1)
Is part of the Bibliography
- no (1) (remove)
Offline speech to text engine for delimited context in combination with an offline speech assistant
(2022)
The inatura museum in Dornbirn had planned an interactive speech assistant-like exhibit. The concept was that visitors could ask the exhibit several questions that they would like to ask a flower. Solution requirements regarding the functionalities were formulated, such as the capacity to run offline because of privacy reasons. Due to the similarity of the exhibit, open-source offline Speech To Text (STT) engines and speech assistants were examined. Proprietary cloud-based STT engines associated with the corresponding speech assistants were also researched. The aim behind this was to evaluate the hypothesis of whether an open-source offline STT engine can compete with a proprietary cloud-based STT engine. Additionally, a suitable STT engine or speech assistant would need to be evaluated. Furthermore, analysis regarding the adaption possibilities of the STT models took place. After the technical analysis, the decision in favour of the STT engines called "Vosk" was made. This analysis was followed by attempts to adapt the model of Vosk. Vosk was compared to proprietary cloud-based Google Cloud Speech to Text to evaluate the hypothesis. The comparison resulted in not much of a significant difference between Vosk and Google Cloud Speech to Text. Due to this result, a recommendation to use Vosk for the exhibit was given. Due to the lack of intent parsing functionality, two algorithms called "text matching algorithm" and "text and keyword matching algorithm" were implemented and tested. This test proved that the text and keyword matching algorithm performed better, with an average success rate of 83.93 %. Consequently, this algorithm was recommended for the intent parsing of the exhibit. In the end, potential adaption possibilities for the algorithms were given, such as using a different string matching library. Some improvements regarding the exhibit were also presented.