Open source and commercially available, the cost of the 30 billion parameter MPT-30B large model is only a fraction of GPT-3

巴比特_

AI large model development company MosaicML recently released a new commercially available open source large language model MPT-30B, with 30 billion parameters, which is significantly more powerful than the previous generation MPT-7B language model (7 billion parameters), and its performance is better than GPT-3.

Image source: Generated by Unbounded AI

In addition, they released two fine-tuned models: MPT-30B-Instruct and MPT-30B-Chat, which build on MPT-30B and are good at single-turn instruction tracking and multi-turn dialogue, respectively.

Features of the MPT-30B model:

  • 8k token context window during training
  • Support for longer contexts via ALiBi
  • Achieve efficient inference + training performance through FlashAttention
  • The MPT-30B series also has strong encoding capabilities due to its pre-trained data mix.

The model has been extended to an 8k token context window on NVIDIA H100, making it the first LLM trained on H100.

MPT-30B stronger than GPT-3?

MPT-30B is a commercial Apache 2.0 licensed open source base model that is stronger than the original GPT-3 and competitive with other open source models such as LLaMa-30B and Falcon-40B.

(Top) Zero-shot accuracy of MPT-30B versus GPT-3 on nine contextual learning (ICL) tasks. MPT-30B outperforms GPT-3 on six out of nine metrics.

MosaicML trained the MPT-30B for 2 months, using Nvidia’s H100 GPU cluster for training.

As shown in the figure below, the training data of MPT-30B:

MPT-30B is pre-trained by data mixing, and 1T pre-training data tokens are collected from 10 different open source text corpora, and the text is segmented using the EleutherAI GPT-NeoX-20B tokenizer, and sampled according to the above ratio.

Comparison of MPT-7B and MPT-30B

MPT-30B Training Cost

Naveen Rao, CEO and co-founder of MosaicML, said that the training cost of MPT-30B is 700,000 US dollars (about 5.0244 million yuan), which is far lower than the tens of millions of dollars required for similar products such as GPT-3. .

How much time and money will it take to train a custom MPT-30B model? Let’s start with the basic model.

The figure above shows the time and cost of pre-training MPT-30B from scratch using A100 or H100 GPUs. With MosaicML infrastructure, you can train your own custom MPT-30B from scratch with 1T tokens in 2 weeks.

What if you don’t want to train from scratch, but just fine-tune an existing model?

The figure below details the time and cost of fine-tuning MPT-30B for each 1B token. With the MosaicML infrastructure, you can fully fine-tune your MPT-30B model without worrying about system memory constraints, and for only a few hundred dollars!

MosaicML said that expanding the model to 30 billion parameters is only the first step, and then they will launch a larger and higher-quality model on the premise of reducing costs.

References:

View Original
Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.
Comment
0/400
No comments
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate App
Community
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)