The cost of training an AI model fell more than 100-fold between 2017 and 2019, yet it remains too expensive for most startups to this day. This is of course in the interest of big companies like Nvidia and Microsoft, which are using massive engineering talent and money to create larger and more capable artificial intelligence models to use in natural language processing, increase search engine results, improve automotive technology, and more. Act. Measuring it is a simple part - identifying and eliminating bias is an unresolved problem.
Nvidia and Microsoft announced Monday that they are working on something called the Megatron Turing natural language production model. The two companies claim to have trained the world's largest and most powerful "integrated converter language model".
To find out its size, the famous GPT-3, which has grabbed headlines for the past few years, now contains 175 billion parameters. By comparison, the new MT-NLG model has 105 layers and no less than 530 billion parameters. The MT-NLG replaces the Turing NLG 17B and Megatron-LM models and has been able to demonstrate "unparalleled accuracy" in a variety of natural language tasks such as reading comprehension, logical reasoning, achievement prediction, and word ambiguity. and natural language inferences.
Photo: Nvidia GPU A100
Nvidia and Microsoft trained this massive AI model on a supercomputer called Selene. The system includes 560 Nvidia DGX A100 servers, each with eight A100 GPUs with 80GB of VRAM that communicate via NVLink and NVSwitch interfaces. Microsoft notes that this configuration is similar to the reference architecture used in Azure NDv4 cloud supercomputers.
Interestingly enough, Selene also supports AMD EPYC 7742 processors. According to the folks at The Next Platform, the cost to build Selene is $85 million — if we consider a 75% discount on data center equipment. Microsoft says MT-NLG has more than 15 datasets containing more than 339 billion codes from English-language web resources, such as academic journals, online forums like Wikipedia and Stack Exchange, code repositories taken from GitHub, news sites, and more. The largest data set is called The Pile and it weighs 835 GB. Candle Database 4.4 2.9 1.8 Wikipedia 4.2 4.8 3.2 Gutenberg (PG-19) 2.7 0.9 0.9 0.9 0.9 0.9 0.9 0.9 GitHub Pile 24.3 1.6 0.2 CC-2020-50 Normal Crawl Image (CC) 68.7 13.0 0.5 CC-2021-04 Combined Crawl ( CC) 82.6 15.7 0.5 RealNews RealNews 21.9 9.0 1 .1 CC-Stories Common Crawl (CC) 5.3 0.9 0.5
Overall, the project showed that larger AI models for good performance require less training. However, one problem that remains unresolved is bias. Even with the use of diverse and realistic data, giant models seem to exhibit bias, stereotype, and toxicity throughout the training process.
Choices can be made for some, but it's been known for years that AI models tend to foster bias in the data you enter. This is because data sets were collected from various online sources where physical, gender, racial, and religious biases are quickly becoming common. The biggest challenge in solving this problem is determining the extent of bias, which is not a small task and it continues to evolve, no matter how much resources are devoted to it.
Some of you may remember a previous experience of Microsoft launching a Twitter chat program called Tay. Tye only took a few hours to acquire the worst features humans can learn, and Redmond had to remove them less than 24 hours after it was released. Both Nvidia and Microsoft said they are committed to addressing the issue and will do all they can to support research into it. At the same time, they warn that organizations wishing to use MT-NLG in manufacturing must ensure that appropriate measures are taken to minimize and minimize potential harm to users. Microsoft notes that any use of AI must comply with the principles of reliability, security, privacy, transparency, and accountability set forth in the AI Handbook.
Microsoft and Nvidia have created the world's largest and most powerful language model to date, but it's still biased