AI large model development company MosaicML recently released a new commercially available open source large language model MPT-30B, with 30 billion parameters, which is significantly more powerful than the previous generation MPT-7B language model (7 billion parameters), and its performance is better than GPT-3.
Image source: Generated by Unbounded AI
In addition, they released two fine-tuned models: MPT-30B-Instruct and MPT-30B-Chat, which build on MPT-30B and are good at single-turn instruction tracking and multi-turn dialogue, respectively.
Features of the MPT-30B model:
The model has been extended to an 8k token context window on NVIDIA H100, making it the first LLM trained on H100.
MPT-30B is a commercial Apache 2.0 licensed open source base model that is stronger than the original GPT-3 and competitive with other open source models such as LLaMa-30B and Falcon-40B.
(Top) Zero-shot accuracy of MPT-30B versus GPT-3 on nine contextual learning (ICL) tasks. MPT-30B outperforms GPT-3 on six out of nine metrics.
MosaicML trained the MPT-30B for 2 months, using Nvidia’s H100 GPU cluster for training.
As shown in the figure below, the training data of MPT-30B:
MPT-30B is pre-trained by data mixing, and 1T pre-training data tokens are collected from 10 different open source text corpora, and the text is segmented using the EleutherAI GPT-NeoX-20B tokenizer, and sampled according to the above ratio.
Comparison of MPT-7B and MPT-30B
Naveen Rao, CEO and co-founder of MosaicML, said that the training cost of MPT-30B is 700,000 US dollars (about 5.0244 million yuan), which is far lower than the tens of millions of dollars required for similar products such as GPT-3. .
How much time and money will it take to train a custom MPT-30B model? Let’s start with the basic model.
The figure above shows the time and cost of pre-training MPT-30B from scratch using A100 or H100 GPUs. With MosaicML infrastructure, you can train your own custom MPT-30B from scratch with 1T tokens in 2 weeks.
What if you don’t want to train from scratch, but just fine-tune an existing model?
The figure below details the time and cost of fine-tuning MPT-30B for each 1B token. With the MosaicML infrastructure, you can fully fine-tune your MPT-30B model without worrying about system memory constraints, and for only a few hundred dollars!
MosaicML said that expanding the model to 30 billion parameters is only the first step, and then they will launch a larger and higher-quality model on the premise of reducing costs.
References: