Microsoft Built Supercomputer for OpenAI to Train Large AI Models

Artificial Intelligence, Science and Technology, Technology & AI

Listen to the Podcast:

You can open Table of Contents show

To train enormous sets of models, Microsoft has created a supercomputer for the OpenAI artificial intelligence (AI) research startup. The business incorporated thousands of Nvidia A100 graphics chips to assist the ChatGPT and Bing AI chatbots.

The Windows manufacturer committed to constructing a “massive, cutting-edge supercomputer” in exchange for its $1 billion investment in OpenAI.

Why did Microsoft Build a Supercomputer?

The purpose of this supercomputer is to offer the computational power required to train and retrain an ever-growing number of AI models using massive amounts of data over protracted periods.

Nidhi Chappell, Microsoft’s head of product for Azure high-performance computing and AI, stated, “One of the things we had learnt from research is that the larger the model, the more data you have, and the longer you can train, the greater the accuracy of the model is.

She continued that in addition to having the most prominent infrastructure, you also need to be able to run it dependably for an extended amount of time. “There was also a tremendous push to get bigger models trained for a more extended period.

Microsoft said it is constructing a supercomputer for OpenAI that would be hosted in Azure and used exclusively to train AI models at its Build developers conference in 2020.

Greg Brockman, president and co-founder of OpenAI, said, “Co-designing supercomputers with Azure has been necessary for scaling our demanding AI training demands, making our research and alignment work on systems like ChatGPT possible.

Microsoft Supercomputer Architecture?

Microsoft modified how it arranged servers on racks to prevent power outages and connected tens of thousands of Nvidia A100 graphics chips to train AI models.

This time. That’s how ChatGPT came to be possible. One model that resulted from it is that. According to Chappell, who Bloomberg quoted, there will be a ton more.

Scott Guthrie, the executive vice president of Microsoft responsible for cloud and AI, stated that the project’s cost is “definitely more” than several hundred million dollars.

Microsoft Azure is Getting more Power

OpenAI also requires a powerful cloud architecture in addition to a supercomputer to process and train models. Microsoft is already working on enhancing the AI capabilities of Azure Cloud with new virtual machines that utilize Quantum-2 InfiniBand networking and Nvidia’s H100 and A100 Tensor Core GPUs.

With this, Microsoft will make it possible for OpenAI and other businesses that rely on Azure to train bigger and more intricate AI models. Some businesses are already deploying AI chatbots.

OpenAI was one of the earliest proof points demonstrating the necessity to create special purpose clusters centered on supporting enormous training workloads. Eric Boyd, corporate vice president of Azure AI at Microsoft, continued, “We worked closely with them to identify what are the important things they were looking for as they built out their training environments and what were the essential things they needed.