Futures
Access hundreds of perpetual contracts
TradFi
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Introduction to Futures Trading
Learn the basics of futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to practice risk-free trading
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
Launchpad
Be early to the next big token project
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
MIT collaborates with NVIDIA to develop TLT technology, achieving up to 210% improvement in training efficiency for inference AI large models
IT Home reported on February 28 that MIT News published a blog post on February 26, saying that the Massachusetts Institute of Technology (MIT), together with NVIDIA and other organizations, has released the “Tail Taming (TLT)” technology, which can significantly improve the training efficiency of reasoning large language models (LLMs).
Citing details from the blog post, IT Home notes that reasoning large models are good at solving complex problems by breaking them down into steps, but during reinforcement learning (RL) training, the consumption of computing power and energy is extremely high.
The research team found that the “rollout” stage, in which multiple candidate answers are generated, accounts for as much as 85% of the training time. Because different processors generate responses of varying lengths, the processors that finish sooner can only be forced into idle waiting, while they wait for other processors to complete long-text tasks—creating a serious efficiency bottleneck.
To address this pain point, MIT researchers, together with NVIDIA, the Swiss Federal Institute of Technology, and other institutions, proposed an adaptive solution called “Tail Taming (TLT).”
The core of the approach is to innovate by applying “speculative decoding” technology—training a smaller “draft model” (drafter) to quickly predict the large model’s future outputs, and then having the large model batch-validate these guesses. In this way, the large model no longer needs to generate outputs one by one in sequence, greatly speeding up processing.
In conventional speculative decoding, the draft model is usually trained only once and kept static. However, in reinforcement learning, the main model needs to be updated thousands of times, and a static draft model quickly becomes ineffective.
Therefore, the TLT system introduces an “adaptive draft trainer.” Once some processors finish short queries and enter an idle state, the system immediately schedules them to train the draft model in real time.
At the same time, an “adaptive rollout engine” automatically adjusts the decoding strategy based on workload characteristics to ensure that the draft model stays highly synchronized with the target large model, without adding extra computational overhead.
Tests based on real-world datasets show that, while maintaining model accuracy with absolutely no loss, TLT technology increases the training speed of multiple reasoning large language models by 70% to 210%.
Not only that, the lightweight draft model obtained through training can also serve as a free byproduct and be directly used for later efficient deployment. In the future, the research team plans to integrate this technology into more training and inference frameworks, further reducing AI development costs and improving energy utilization.