Speech Recognition
| On 3 years ago

The Future of Speech Recognition, Conversational AI, and the Human Element

Share

The Future of Speech Recognition, Conversational AI, and the Human Element

 

Mary, a mother of two, asks Theodora, “What will the weather be like for the rest of the month?” *Theodora tells her that it will be a good time to get coats and boots together at her favorite store in two weeks. Theodora knows Mary’s calendar, shopping habits, location, projected weather anomalies, and remembers her questions. In Newton, Massachusetts, while in traffic, John, a single dad, remembers maybe needing an oil change. He asks Theodora, “When is it a good time to get my oil change?” Theodora responds, “John, you should contact Moore’s Oil Works in five weeks. Make sure it is before 3 pm Tuesday through Friday. Wear a light sweater.” Theodora is a smart conversational AI agent activated by voice. She is based on advanced neural networks and speech recognition.

According to IBM, conversational artificial intelligence (AI) relies on large volumes of data and complex learning. As the term implies, it’s seamless speech speaking to an artificial agent to obtain information or conduct a transaction. Presently, you’ll likely interact with a conversational AI agent anytime you call your bank or reach your retail customer support centers. This innovation isn’t lucky, but a consequence of technological evolution and increasing demands to increase efficiency while decreasing cost. Before talking to Theodora, there was and still is the arduous work of speech recognition.

To build a system that understands your voice–considering differences in language, dialect, accents–requires an enormous amount of diverse text corpora, which is text data structured and unstructured. With all the speech data, the next two salient requirements not only increase accuracy but ensure technology adoption.
 

  • Domain-based transcription: Different industries and fields, from healthcare to law enforcement, require specialized knowledge to conduct day-to-day work. Technical and common-day jargon isn’t usually in a dictionary but is understood over time and socialized in a human context. The audio captured from attending physicians and their encounters based on various medical specialties contains the specialized prognosis or diagnosis and the physician’s speech patterns–all of which is transcribing audio.

 

  • Error-correction feedback loop: Ten years ago, speech recognition models were within 80th percentile accuracy, meaning 20 percent of the time what you dictated would not transcribe what was said. Under ideal conditions and plenty of domain-based text corpora, today’s rates are in the 90th percentile. The kernel is not only in a huge abundance of text data but also corrected data. The more text, the more corrections will be necessary; and thus, the feedback loop is virtuous and guarantees accurate output and Theodora giving the right advice.
  •  

Rooted in efficiency and intelligence, speech recognition and conversational AI will be everywhere but not without plenty more learning and governance, where the human element is vital.

Company technology departments have set up or follow different types of governance– such as in project management, data, and now, artificial intelligence. Governance by design ensures impartiality, security, best practices in implementation and operational use, and ethics. Although intelligent artificial agents may also assist in governance, ongoing monitoring, systemic conflicts (i.e., what is uninterruptible to a machine), error correction, and much more, human beings strike the balance in a system meant to help, not hinder, societal and human progress.

AI governance is a growing need. With contextual speech recognition in use and voice-entering activities becoming the standard in healthcare to minimize physician burnout and increase productivity, errors in prognosis and diagnosis are going up. One study determined speech recognition generated documents generated 4.3 times the normal error rate than documents using keyboard and mouse clicks. This nefariously affects patient care, followed by healthcare organizational finances and growing AI-based research. Coding the wrong condition to misidentifying the wrong trends result in the unanticipated effects of declining health, future disease prevention, and faulty decision making.

We advocate efficiency through smart technology and human intervention at its most vulnerable areas. Meeting tomorrow’s needs is also about addressing tomorrow’s challenges now.

Athreon spans over 30 years in speech to text work and over 10 years in technology expertise. Regardless of industry, if you’d like a one-time complimentary consultation about your speech to text and transcription needs–from a current or future perspective–contact us today.

* Theodora is the name of an Athreon AI concept model meant for understanding internal operational performance.