In this guide, you'll learn how to fine-tune your own LLMs using Unsloth. Fine-tuning Large Language Models with LoRa and QLoRA has become popular due to its efficiency and low resource requirements. This step-by-step guide covers everything from how OpenAI (ChatGPT) and Anthropic (Claude) train their own LLMs, to practical tutorials where I show you exactly how to fine-tune your own LLMs using LoRA, QLoRA, and GRPO with Unsloth.
First, I'll explain why you should fine-tune LLMs and how fine-tuning can even enhance a RAG setup. Next, we'll discuss how to select the best open-source model available on Hugging Face (such as Llama-3.3, Gemma-3 and DeepSeek) for fine-tuning. Finally, we'll dive into practical fine-tuning tutorials using Unsloth, showing you:
- How to use supervised fine-tuning (SFT) to create a LoRa for a completion model capable of generating creative ASCII art.
- How to use supervised fine-tuning (SFT) to create a QLoRa for a chat model.
- How to fine-tune an LLM using Group Relative Policy Optimization (GRPO) to create an inference-time reasoning model like Deepseek-R1.
- How to quantize and convert your fine-tuned model to GGUF
- How to run your fine-tuned model locally with Ollama or llama.cpp.
Github to resources used:
https://github.com/vossenwout/llm-finetuning-resources
Timestamps:
00:00:00 - Intro
00:02:00 - Why fine-tune your own LLM?
00:05:50 - Fine-tuning vs RAG
00:12:30 - How is ChatGPT trained?
00:16:25 - QLoRA fine-tuning explained
00:19:20 - Which LLM should I use?
00:26:53 - How to create a dataset?
00:31:26 - How to train for free?
00:34:30 - How to save and quantize model as GGUF
00:37:20 - Inference with Ollama
00:38:20 - LoRa fine-tuning a completion model with Unsloth
00:59:50 - QloRa fine-tuning a chat model with Unsloth
01:16:22 - Using GRPO to create a QloRa reasoning model with Unsloth
#unsloth #finetuning #llm #lora #qlora #grpo #ollama #chatgpt #ai