3200
464
MIT

Train Your Own LLM From Scratch

A straightforward method for training your LLM, from downloading data to generating text

LLM Training Process
Get Started Now

Why Train Your Own LLM?

🚀

Easy Implementation

Complete transformer model from scratch using PyTorch, based on the "Attention is All You Need" paper

📊

Scalable Architecture

Train models from millions to billions of parameters using a single GPU

🔧

Ready-to-Use Scripts

Complete pipeline for data download, preprocessing, training, and text generation

3200
Stars
464
Forks
34
Subscribers
5
Open Issues

How It Works

1

Download Data

Download the Pile dataset (825GB) with scripts that support selective file downloads

2

Preprocess Data

Tokenize and encode the dataset using tiktoken for efficient training

3

Train Model

Train your transformer model with configurable parameters from 13M to billions

4

Generate Text

Use the trained model to generate coherent text based on your prompts

GPU Requirements Comparison

GPU Name
Memory
2B LLM
13M LLM
Max Size
NVIDIA A100
40 GB
6B–8B
NVIDIA RTX 4090
24 GB
4B
NVIDIA RTX 3080
10 GB
1.2B
NVIDIA RTX 4060
8 GB
1B

Code Structure

train-llm-from-scratch/
├── src/
│ ├── models/
│ │ ├── mlp.py
│ │ ├── attention.py
│ │ ├── transformer_block.py
│ │ └── transformer.py
├── config/
│ └── config.py
├── data_loader/
│ └── data_loader.py
├── scripts/
│ ├── train_transformer.py
│ ├── data_download.py
│ ├── data_preprocess.py
│ └── generate_text.py
├── data/
├── models/

Sample Output

13 Million Parameter Model

In 1978, The park was returned to the factory-plate that the public share to the lower of the electronic fence that follow from the Station's cities. The Canal of ancient Western nations were confined to the city spot. The villages were directly linked to cities in China that revolt that the US budget and in Odambinais is uncertain and fortune established in rural areas.

2 Billion Parameter Model

There are two miles east coast from 1037 and 73 million refugees (hypotetus) as the same men and defeated Harvard, and Croft. At right east and West Nile's Mediterranean Sea jets. It was found there a number of parties, blacksmith, musician and boutique hospitality and inspire the strain delivered Canadians have already killed, rural branches with coalition railholder against Abyssy.

Ready to Start Training?

Join thousands of developers building their own language models

View on GitHub