Futures
Access hundreds of perpetual contracts
TradFi
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Introduction to Futures Trading
Learn the basics of futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to practice risk-free trading
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
Pre-IPOs
Unlock full access to global stock IPOs
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
PyTorch TorchInductor integrates CuteDSL as the automatic tuning backend for matrix multiplication
ME News update: On April 7 (UTC+8), the official PyTorch team recently announced that CuteDSL has been integrated into TorchInductor as the fourth matrix-multiplication auto-tuning backend. The backend was selected based on three criteria: it does not add too much maintenance burden, it does not slow down compilation or benchmarking time, and it delivers better performance on targeted workloads. CuteDSL is actively developed by NVIDIA and provides optimized kernel templates. Its compilation time is comparable to existing backends, and it is significantly better than the CUTLASS C++ path that requires full \nvcc\ compilation. The backend is built on the same abstraction as CUTLASS C++, is written in Python, compiles faster, is easier to maintain, and has demonstrated strong performance in FP8 GEMM and Epilogue fusion. The team focuses on optimizing GEMM (matrix multiplication) because it accounts for the majority of the computational overhead in Transformer models. CuteDSL generates low-level code by providing handcrafted optimized templates, avoiding the complexity of writing kernels from scratch, and fully exposes the thread and memory hierarchy, supporting architecture-specific features. (Source: InFoQ)