Listen to the Podcast:
Google has released new details about the Universal Speech Model (USM), a system the company describes as a “critical first step” toward realizing its goals.
The company is now getting closer to its goal of building an AI language model that supports 1,000 languages to beat ChatGPT.
In November last year, the business revealed its USM model and its aspirations to construct a language model supporting 1,000 of the world’s most spoken languages.
The tech giant describes USM as a collection of cutting-edge speech models with 2 billion parameters trained on 12 million hours of speech and 28 billion sentences of text in 300+ languages.
“USM, which is meant to be used on YouTube (for example, for closed captioning), can do automatic speech recognition (ASR) not only on widely spoken languages like English and Mandarin, but also on languages that aren’t as common, like Japanese and Korean. on under-resourced languages like Amharic, Cebuano, Assamese, and Azerbaijani, to name a few,” Google wrote in a blog post.
Google says that USM works with more than 100 languages and will be the “base” for a much bigger system.
Meanwhile, Google is scheduled to release a slew of AI capabilities for its products in the next few months, including Gboard for Android, which is planning to integrate the Imagen text-to-image generator.