What is Automatic Speech Recognition?
Automatic Speech Recognition (ASR) is the term given to the technology used to transcribe spoken words into written text.
Ubiqus uses one form of ASR – Large Vocabulary Continuous Speech Recognition (LVCSR) – based on the automatic identification of very short audio sequences. This technology makes it possible to produce an extremely high quality transcription, providing that the recording used has been made correctly. ASR has seen significant developments in recent years, and our R&D team is contributing to its continual growth.
Our working method means that we can handle not only recordings containing non-specialised vocabulary, but also those that include more specific terms (technical, legal, medical, etc).
The production of a final transcription involves a 4-step process:
1 | Voice Activity Detection
Firstly, it is important to identify when talking/speech is present during the recording, in order to cut the soundtrack into segments. The machine will then work on each of these segments.
2 | Diarization
Next, we need to identify the different speakers in each recording, and to group them into segments according to their identity, solving the problem of ‘who speaks when?’ For this, the machine uses different models containing specific data (languages, voice). In this way, it can differentiate the subtleties of a language (such as accents, for example). Note that at this point, we are still processing the data in a “mathematical” way.
3 | Decoding
This is when the actual transcription starts. A list of possible syllables (phonemes) is established for each audio segment. For now, no full sentences have been generated only one long list of possibilities, each with a score.
4 | Rescoring
The computer chooses, from all the phonemes and words learned during the initial phase, those that are likely to form the most accurate sentence (it’s a little like the way a GPS identifies the best route). It is this sentence that is transcribed into the document.
This process is applied to every segment of the recording. The final result is a complete transcription.
At the end of this automated process, the document is re-read by our teams, in the same way as any other Ubiqus document: in addition to checking the content as a whole, the proofreader will also ensure the speech has been correctly attributed.
Combining technology and human know-how at Ubiqus
The sectors using
language technology tools
Shall we talk about your project?