Offline speech to text engine for delimited context in combination with an offline speech assistant

Weiß, Pia-Maria

doi:10.25924/opus-4557

The search result changed since you submitted your search request. Documents might be displayed in a different sort order.

search hit 1 of 40

Back to Result List

Offline speech to text engine for delimited context in combination with an offline speech assistant

Pia-Maria Weiß

The inatura museum in Dornbirn had planned an interactive speech assistant-like exhibit. The concept was that visitors could ask the exhibit several questions that they would like to ask a flower. Solution requirements regarding the functionalities were formulated, such as the capacity to run offline because of privacy reasons. Due to the similarity of the exhibit, open-source offline Speech To Text (STT) engines and speech assistants were examined. Proprietary cloud-based STT engines associated with the corresponding speech assistants were also researched. The aim behind this was to evaluate the hypothesis of whether an open-source offline STT engine can compete with a proprietary cloud-based STT engine. Additionally, a suitable STT engine or speech assistant would need to be evaluated. Furthermore, analysis regarding the adaption possibilities of the STT models took place. After the technical analysis, the decision in favour of the STT engines called "Vosk" was made. This analysis was followed by attempts to adapt the model of Vosk. Vosk was compared to proprietary cloud-based Google Cloud Speech to Text to evaluate the hypothesis. The comparison resulted in not much of a significant difference between Vosk and Google Cloud Speech to Text. Due to this result, a recommendation to use Vosk for the exhibit was given. Due to the lack of intent parsing functionality, two algorithms called "text matching algorithm" and "text and keyword matching algorithm" were implemented and tested. This test proved that the text and keyword matching algorithm performed better, with an average success rate of 83.93 %. Consequently, this algorithm was recommended for the intent parsing of the exhibit. In the end, potential adaption possibilities for the algorithms were given, such as using a different string matching library. Some improvements regarding the exhibit were also presented.
Das inatura Museum in Dornbirn hatte ein interaktives sprachassistentenähnliches Exponat geplant. Das Konzept sah vor, dass die Benutzenden dem Exponat verschiedene Fragen stellen können, die sie auch einer Blume stellen würden. Es wurden Lösungsanforderungen hinsichtlich der Funktionalitäten formuliert, wie z.B. die Fähigkeit, aus Datenschutzgründen offline zu laufen. Aufgrund der Ähnlichkeit des Exponats wurden Open-Source-Offline-STT-Engines und Sprachassistenten untersucht. Proprietäre Cloud-basierte STT-Engines in Verbindung mit den entsprechenden Sprachassistenten wurden ebenfalls untersucht. Ziel war es, die Hypothese zu evaluieren, ob eine Open-Source-Offline-STT-Engine mit einer proprietären Cloud-basierten STT-Engine konkurrieren kann. Zusätzlich sollte eine geeignete STT-Engine oder ein Sprachassistent evaluiert werden. Darüber hinaus wurde eine Analyse der Anpassungsmöglichkeiten der STT-Modelle durchgeführt. Nach der technischen Analyse fiel die Entscheidung zugunsten der STT-Engine namens "Vosk". Auf diese Analyse folgten Versuche, das Modell von Vosk anzupassen. Vosk wurde mit der proprietären Cloud-basierten Google Cloud Speech to Text verglichen, um die Hypothese zu bewerten. Der Vergleich ergab, dass es keinen signifikanten Unterschied zwischen Vosk und Google Cloud Speech to Text gibt. Aufgrund dieses Ergebnisses wurde empfohlen, Vosk für das Exponat zu verwenden. Aufgrund der fehlenden Intent-Parsing-Funktionalität wurden zwei Algorithmen namens "Text-Matching-Algorithmus" und "Text-and-Keyword-Matching-Algorithmus" implementiert und getestet. Dieser Test ergab, dass der Text-and-Keyword-Matching-Algorithmus mit einer durchschnittlichen Erfolgsquote von 83,93 % besser abschnitt. Folglich wurde dieser Algorithmus für das Intent-Parsing des Exponats empfohlen. Abschließend wurden potenzielle Anpassungsmöglichkeiten für die Algorithmen genannt, wie z.B. die Verwendung einer anderen String-Matching-Bibliothek. Es wurden auch einige Verbesserungen bezüglich des Exponats vorgestellt.

Metadaten
Author:	Pia-Maria Weiß
DOI:	https://doi.org/10.25924/opus-4557
Advisor:	Walter Ritter
Document Type:	Master's Thesis
Language:	English
Year of publication:	2022
Publishing Institution:	FH Vorarlberg (Fachhochschule Vorarlberg)
Granting Institution:	FH Vorarlberg (Fachhochschule Vorarlberg)
Release Date:	2022/10/03
Number of pages:	V, 72, LXXXVI
DDC classes:	000 Allgemeines, Informatik, Informationswissenschaft
Open Access?:	ja
Course of Studies:	Informatik
Licence (German):	UrhG - The Austrian Copyright Act applies - Es gilt das österr. Urheberrechtsgesetz

Open Access

Offline speech to text engine for delimited context in combination with an offline speech assistant

Download full text files

Export metadata

Additional Services