menu

Steve Clayton: Speak New Languages Instantly With Microsoft Tech

Steve Clayton: Speak New Languages Instantly With Microsoft Tech
technology

This voice recognition software translates your speech and then uses your own voice to talk in another language.

Steve Clayton, Microsoft
  • 12 november 2012

Contributed by Rick Rashid, Microsoft’s Chief Research Officer

A demonstration I gave in Tianjin, China at Microsoft Research Asia’s 21st Century Computing event has started to generate a bit of attention, and so I wanted to share a little background on the history of speech-to-speech technology and the advances we’re seeing today.

In the realm of natural user interfaces, the single most important one – yet also one of the most difficult for computers – is that of human speech.

For the last 60 years, computer scientists have been working to build systems that can understand what a person says when they talk.

In the beginning, the approach used could best be described as simple pattern matching. The computer would examine the waveforms produced by human speech and try to match them to waveforms that were known to be associated with particular words.

While this approach sometimes worked, it was extremely fragile. Everyone’s voice is different and even the same person can say the same word in different ways. As a result these early systems were not really usable for practical applications.

In the late 1970s a group of researchers at Carnegie Mellon University made a significant breakthrough in speech recognition using a technique called hidden Markov modeling which allowed them to use training data from many speakers to build statistical speech models that were much more robust. As a result, over the last 30 years speech systems have gotten better and better. In the last 10 years the combination of better methods, faster computers and the ability to process dramatically more data has led to many practical uses.

Today if you call a bank in the US you almost certainly are talking to a computer that can answer simple questions about your account and connect you to a real person if necessary. Several products on the market today, including XBOX Kinect, use speech input to provide simple answers or navigate a user interface. In fact our Microsoft Windows and Office products have included speech recognition in them since the late 90’s. This functionality has been invaluable to our customers with accessibility needs.

Until recently though, even the best speech systems still had word error rates of 20-25% on arbitrary speech.

Just over two years ago, researchers at Microsoft Research and the University of Toronto made another breakthrough. By using a technique called Deep Neural Networks, which is patterned after human brain behavior, researchers were able to train more discriminative and better speech recognizers than previous methods.

During my October 25 presentation in China, I had the opportunity to showcase the latest results of this work. We have been able to reduce the word error rate for speech by over 30% compared to previous methods. This means that rather than having one word in 4 or 5 incorrect, now the error rate is one word in 7 or 8. While still far from perfect, this is the most dramatic change in accuracy since the introduction of hidden Markov modeling in 1979, and as we add more data to the training we believe that we will get even better results.

Machine translation of text is similarly difficult. Just like speech, the research community has been working on translation for the last 60 years, and as with speech, the introduction of statistical techniques and Big Data have also revolutionized machine translation over the last few years. Today millions of people each day use products like Bing Translator to translate web pages from one language to another.

In my presentation, I showed how we take the text that represents my speech and run it through translation- in this case, turning my English into Chinese in two steps. The first takes my words and finds the Chinese equivalents, and while non-trivial, this is the easy part. The second reorders the words to be appropriate for Chinese, an important step for correct translation between languages.

Of course, there are still likely to be errors in both the English text and the translation into Chinese, and the results can sometimes be humorous. Still, the technology has developed to be quite useful.

Most significantly, we have attained an important goal by enabling an English speaker like me to present in Chinese in his or her own voice, which is what I demonstrated in China. It required a text to speech system that Microsoft researchers built using a few hours speech of a native Chinese speaker and properties of my own voice taken from about one hour of pre-recorded (English) data, in this case recordings of previous speeches I’d made.

Though it was a limited test, the effect was dramatic, and the audience came alive in response. When I spoke in English, the system automatically combined all the underlying technologies to deliver a robust speech to speech experience—my voice speaking Chinese. You can see the demo in the video above.

The results are still not perfect, and there is still much work to be done, but the technology is very promising, and we hope that in a few years we will have systems that can completely break down language barriers.

In other words, we may not have to wait until the 22nd century for a usable equivalent of Star Trek’s universal translator, and we can also hope that as barriers to understanding language are removed, barriers to understanding each other might also be removed. The cheers from the crowd of 2000 mostly Chinese students, and the commentary that’s grown on China’s social media forums ever since, suggests a growing community of budding computer scientists who feel the same way.

Originally posted on Next at Microsoft. Republished with kind permission.

Trending

Genetics Startup Is Working To Create A Completely Personalized, DNA-Based Range Of Products

Health
Children Today

Experience The White House In Augmented Reality Using A $1 Bill

1600 Pennsylvania Ave. is an AR app designed to help people learn about the history and significance of the United State's capital building

Travel Today

Portable Computer Monitor Opens Up Like An Umbrella For Travelers

This mobile screen and projector means work can go anywhere and still feel like working from home

Trending

Get PSFK's Related Report: Future of Automotive

See All
Retail Today

Starbucks Is Selling An Automated Temperature-Setting Mug

For those looking to keep their coffee hot on their winter commutes, the coffee chain has created a device that keeps beverages exactly at their desire temperature

Related Expert

Sharon Chang

Social Innovation Matchmaker

Op-Ed Yesterday

Store Technology Expert: Why Retailers Must Invest In Store Associates

Jan Kotowski, Head of Product at Tulip Retail, shares his thoughts on how retailers should be preparing for the future

Product Launch Yesterday

United Airlines Launches An Updated Business Class Program

The new Polaris product prioritizes customer service and updated modern amenities

Travel Yesterday

Video Explores Complex Museum Architecture

A mini video gives a quick overview of the most beautiful cultural buildings built

PSFK LABS REPORT

Future Of Retail 2017
Transformation Strategies For Customer-First Business
NEW

PSFK Op-Ed december 2, 2016

Customer Service Expert: Why Offline Retail Has Better Data Than Online Retail

Healey Cypher, Founder and CEO of Oak Labs, shares why we should be thinking about the physical store as an e-commerce site

PSFK Labs december 1, 2016

Retail Spotlight: Home Depot Reimagines How Employees Conduct Tasks

The home improvement retailer puts the customer first by initiating local fulfillment centers and simplifying freight-to-shelf inventory management

Sustainability Yesterday

Smog Vacuum Turns Pollution Particulates Into Unique Jewelry

A large device was installed in Beijing that sucks up smog and compresses it into small centerpieces atop jewelry

Fashion Yesterday

PSA Fashion Line Shines A Light On Victim Blaming

YWCA Canada is using fashion and sexist tweets to highlight how often we as a society blame the victim in cases of abuse

Syndicated Yesterday

What PSX 2016 Tells Us About The Modern Games Industry

Nostalgia and big brands are defining how the console market is being shaped

Financial Services Yesterday

VP: Why Messaging Apps Are Issuing In A New Era Of Commerce

Matt Johnson discusses how mobile messaging commerce is creating a different modality for interaction between retailers and consumers

Innovation Yesterday

The PSFK Holiday Gift Guide 2016

Based on a year of research by PSFK Labs, we curated a list of innovative and unique holiday gifts

PSFK EVENT

FUTURE OF RETAIL 2017
Conference Built Around Our Report Launch
BUY TICKETS

Retail Yesterday

Minute Maid Opens A Store That Sells Nothing

The beverage company opened a pop-up shop that encourages customers to write letters to their parents instead of buying a gift

Entertainment Yesterday

Samsung And Viceland Partner For A Virtual Reality Documentary On The Syrian War

The White Helmets film uses VR to immerse viewers in the everyday conditions of the war-torn country

Design & Architecture Yesterday

These Designs Bring Modern Architecture To The Humble Birdhouse

Artist Douglas Barnhard has imagined a series of designs emulating the work of architects such as Frank Lloyd Wright and Joseph Eichler

No search results found.