A straightforward method for training your LLM, from downloading data to generating text
Complete transformer model from scratch using PyTorch, based on the "Attention is All You Need" paper
Train models from millions to billions of parameters using a single GPU
Complete pipeline for data download, preprocessing, training, and text generation
Download the Pile dataset (825GB) with scripts that support selective file downloads
Tokenize and encode the dataset using tiktoken for efficient training
Train your transformer model with configurable parameters from 13M to billions
Use the trained model to generate coherent text based on your prompts
Join thousands of developers building their own language models
View on GitHub