AI & machine learning

Fine-tuning

The process of taking a pre-trained model and training it further on a smaller, more specialised dataset to adapt it to a specific task, domain, or style. Fine-tuning is much cheaper than training from scratch.

Also known as: finetuning, SFT, supervised fine-tuning

Fine-tuning exists because training a frontier model from scratch is so expensive that almost nobody can afford to do it. Pre-training a 70-billion-parameter model from random weights costs tens of millions of dollars and weeks of compute on thousands of GPUs. Fine-tuning the same model on a specific domain costs thousands of dollars and a few hours on a handful of GPUs. The pre-trained model has already learned general language and reasoning; fine-tuning teaches it your specific style, vocabulary, or task.

The most common kinds of fine-tuning are supervised fine-tuning (SFT, where you show the model paired examples of “input, desired output” and train it to produce the desired output), instruction tuning (a special case of SFT focused on teaching the model to follow user instructions), and preference tuning (RLHF, DPO, constitutional AI — methods that adjust the model’s behaviour based on which outputs humans or other models prefer). Modern open-weight models like Llama 4 and Hermes 4 were pre-trained centrally and then fine-tuned by various teams to add specific capabilities or align with specific values.

LoRA (Low-Rank Adaptation) is a fine-tuning technique that’s particularly important for resource-constrained use cases. Instead of updating every parameter in the model, LoRA freezes the original weights and trains a small set of “adapter” parameters on top. This makes fine-tuning much faster and cheaper, and you can stack many LoRA adapters on the same base model to get different specialised versions without storing many copies of the full model. Most DeAI fine-tuning happens via LoRA or similar parameter-efficient methods.

In DeAI, fine-tuning is the primary way most projects produce custom model capabilities. Nous Research’s Hermes models are open-weight community fine-tunes of Llama and other base models. Many DeAI inference networks let users bring their own LoRA adapters and run them against shared base models. Some decentralised training projects (Templar, Prime Intellect) have explored fine-tuning workloads as their first commercial use case because the compute requirements are an order of magnitude smaller than full pre-training. The OYM “Decentralised AI Training” article covers the technical and economic landscape in more detail.

Related terms