Google brings Natural Language Processing to languages not usually found on the web

Natural Language Processing is a computing problem that still remains largely unsolved. It is an important step in the development of machines and software that can “understand” humans, and while there have been significant advances in some areas of speech recognition and understanding, they are commonly limited to a small percentage of the worldwide population due to which languages they can understand and what availability there is of the software.

However, with a new initiative, Google are attempting to do something about it.

Voice Search is a Google product that started out life as ‘GOOG-411’, a freephone number in the US that could be called to request information. Speech recognition technology turned your voice into a search query, and then read out the results for you. This was shuttered today. However, its legacy lives on in numerous forms; the latest Google Chrome builds support voice input on correctly marked up input fields on web pages; you can control your Android phone with Voice Actions; and you may perform searches on iPhone, BlackBerry, Nokia S60 and Android devices with the Voice Search functionality.

In order to improve the field of Natural Language Processing and to bring Voice Search to more people, Google have begun introducing voice search for what they are calling “underrepresented languages”. “We define underrepresented languages as those which, while spoken by millions, have little presence in electronic and physical media, e.g., webpages, newspapers and magazines ” writes Pedro Moreno, a scientist at Google. Making use of an Android application for capturing voice samples in various environments and accents (on which Google has published a paper), Afrikaans and Zulu have been mapped to lexicons and language models which are now supported in the Voice Search applications. By combining the voice samples, text searches and documents on the web, and local knowledge (especially important with Zulu, where there is very little web content) they could produce much more accurate results than previously available.

Google say they will continue this initiative to help make more underrepresented languages accessible to Natural Language Processing research.

You might also like...

Comments

Contribute

Why not write for us? Or you could submit an event or a user group in your area. Alternatively just tell us what you think!

Our tools

We've got automatic conversion tools to convert C# to VB.NET, VB.NET to C#. Also you can compress javascript and compress css and generate sql connection strings.

“I invented the term Object-Oriented, and I can tell you I did not have C++ in mind.” - Alan Kay