Automatic Speech Recognition (ASR)

Automatic Speech Recognition (ASR) Development

Automatic speech recognition (ASR) (sometimes known as S2T [speech-to-text] or machine transcription) is an advanced form of technology used to transcribe spoken words into written text.

ASR continues to evolve and Ubiqus’ research and development team has made it a priority to contribute to its progress and direction in the industry.


Large Vocabulary Continuous Speech Recognition (LVCSR) 

The form of ASR used by Ubiqus is known as large vocabulary continuous speech recognition (LVCSR).

It is based on the automatic identification of very short audio sequences and makes it possible to produce a transcript of very high quality.

Advantages for Clients

Plenty of advantages come with the use of ASR, particularly LVCSR, provided that the recorded audio itself is of high quality. Benefits include, but are not limited to, the following:

Four-Step Process


1] Voice Activity Detection

First and foremost, it is important to identify when speech is present in the recorded audio in order to cut the soundtrack into segments, following which, the machine will work on each of these segments.


2] Diarization

Next comes the process of speaker identification, where each speaker is identified and grouped into segments according to their identity. To achieve this, the machine uses different models containing specific data, such as language and voice. From here, it is able to differentiate subtleties of a language, such as accents.


3] Decoding

This is the stage where the actual transcription begins. It is now that a list of possible syllables (phonemes) is established for each audio segment. No full sentences will be generated yet – just an exhaustive list of possibilities, each assigned a score.


4] Rescoring

After having learned various phonemes and words, the engine now chooses from them the ones most likely to form the most accurate sentence (similar to the way a GPS chooses a best route). Using this process, a sentence is established and is transcribed.


This same process applies to every segment of the audio recording.

When the automated process is complete, the document is re-read by Ubiqus proofreaders, just like any other document. In addition to general content verification, the proofreaders ensure that the speech has been correctly attributed.