The demand for transcription, subtitles, and captioning is in the rise. As people are increasingly watching television and other video sources in noisy venues, or quiet venues, we’re not able to rely on the soundtrack that accompanies videos. There’s also the need to meet accessibility requirements for those unable to hear audio.
- Automated voice to text systems, like the captioning service provided by YouTube, typically have too many errors to provide a meaningful and effective text description of what’s being said.
- Paying for transcription or captioning services is very costly.
- Doing the work yourself is time consuming and can require expensive software (although free options exist).
Better Audio Quality
One of the problems with Google or any voice to text system is that there are difficulties with playing back poor quality recordings of spoken words. While recorded speech may sound fine to us, it lacks the original nuances that software needs for better accuracy. When interpreting speech, our brains are able to correct for a lot of background noise. The audio quality standards that may have been adequate in the past, are now not sufficient for automated captioning. A solution is to use the best microphones and audio possible, including clip-on microphones, that will clearly pick up as many subtleties as possible from the person speaking.
Realtime Parallel Translation Transcription
In the past it was necessary to purchase expensive voice to text dictation software, along with a high quality microphone, and spend a considerable amount of time training the software to recognize your voice. Even then, the results were often not very accurate.
With the introduction of Siri (on the iPhone), dictation from voice to text was made available for free, to everyone, without any training required. Anyone could speak into an iPhone and see their words typed. This is because the audio of your spoken words are being sent over the Internet to a very powerful computer and returned as text.
“The technology behind Siri was actually first developed by SimulScribe to convert telephone voicemail messages to text. Sometime around the year 2007, in speaking with the president of the company, I asked if they were using a technology similar to echelon. He told me that it wasn’t similar to echelon, but in fact was echelon. They were renting computing time on the government surveillance supercomputers that were so powerful they could convert any speech to text. This is how the transcription could be so accurate.” ~ Greg Johnson
While Siri works well with live speech, it’s not so effective with recorded speech.
For anyone wanting quick transcription, here’s a quick, easy, and inexpensive solution. Put on some headphones and as you’re watching the video you want transcribed, simply talk into your iPhone word-for-word everything that’s being said in the video and enjoy the convenience of nearly 100% accurate transcription! This works for videos or audio-only recordings.