LibreTechni.ca
  • Communities
  • Create Post
  • Create Community
  • heart
    Support Lemmy
  • search
    Search
  • Login
  • Sign Up
cm0002@infosec.pub to AI - Artificial intelligence@programming.devEnglish · 1 day ago

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

arxiv.org

external-link
message-square
0
fedilink
  • cross-posted to:
  • technology@lemmy.ml
3
external-link

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

arxiv.org

cm0002@infosec.pub to AI - Artificial intelligence@programming.devEnglish · 1 day ago
message-square
0
fedilink
  • cross-posted to:
  • technology@lemmy.ml
We present MegaTrain, a memory-centric system that efficiently trains 100B+ parameter large language models at full precision on a single GPU. Unlike traditional GPU-centric systems, MegaTrain stores parameters and optimizer states in host memory (CPU memory) and treats GPUs as transient compute engines. For each layer, we stream parameters in and compute gradients out, minimizing persistent device state. To battle the CPU-GPU bandwidth bottleneck, we adopt two key optimizations. 1) We introduce a pipelined double-buffered execution engine that overlaps parameter prefetching, computation, and gradient offloading across multiple CUDA streams, enabling continuous GPU execution. 2) We replace persistent autograd graphs with stateless layer templates, binding weights dynamically as they stream in, eliminating persistent graph metadata while providing flexibility in scheduling. On a single H200 GPU with 1.5TB host memory, MegaTrain reliably trains models up to 120B parameters. It also achieves 1.84$\times$ the training throughput of DeepSpeed ZeRO-3 with CPU offloading when training 14B models. MegaTrain also enables 7B model training with 512k token context on a single GH200.
alert-triangle
You must log in or register to comment.

AI - Artificial intelligence@programming.dev

Aii@programming.dev

Subscribe from Remote Instance

Create a post
You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !Aii@programming.dev

AI related news and articles.

Rules:

  • No Videos.
  • No self promotion: Don’t post links to your articles.
Visibility: Public
globe

This community can be federated to other instances and be posted/commented in by their users.

  • 11 users / day
  • 71 users / week
  • 126 users / month
  • 668 users / 6 months
  • 1 local subscriber
  • 255 subscribers
  • 201 Posts
  • 125 Comments
  • Modlog
  • mods:
  • Vacant@programming.dev
  • cm0002@programming.dev
  • BE: 0.19.5
  • Modlog
  • Instances
  • Docs
  • Code
  • join-lemmy.org