Training — Technical Glossary

Training is the part of AI development that costs real money. A single frontier-scale training run uses tens of thousands of GPUs running continuously for weeks, processing petabytes of text or images, adjusting billions of internal parameters in tiny increments millions of times per second. The total cost is often $20-100 million. The output is a single frozen set of weights that can then be used forever without any further training cost.

There are several phases to modern training. Pre-training teaches the model the basic statistical patterns of language or images by feeding it raw internet-scale data. Fine-tuning adapts the pre-trained model to specific tasks or domains using smaller, higher-quality datasets. Post-training (RLHF, DPO, constitutional AI) shapes the model’s behaviour to match what humans actually want rather than just predicting the next token. Each phase is progressively smaller and cheaper than the one before, which is why fine-tuning existing open-weight models has become a practical alternative to training from scratch.

The DeAI angle on training is the hardest technical problem in the whole space. Training neural networks requires tightly synchronised communication between GPUs because every gradient update depends on the state of every other GPU in the cluster. Splitting this across the public internet without catastrophic slowdown was widely considered impossible until projects like Nous Research’s DisTrO and Templar’s Covenant demonstrated it could work at medium scale. Prime Intellect’s INTELLECT runs pushed the envelope further, then reverted to centralised clusters for INTELLECT-3. The technology is real but it’s still 18-24 months behind frontier centralised training.

The honest framing is that decentralised training today is decentralised between research labs and small data centres, not between individuals. Covenant-72B required minimum nodes of 8x NVIDIA B200 GPUs per peer, which is roughly $240K-$320K of hardware per participant. The “anyone with a gaming PC can train AI” framing some projects use is marketing. What decentralised training actually proves is that distributed clusters can coordinate over public internet to produce frontier-scale models without sharing a physical building, which is interesting but a much narrower claim.

Related terms