A small, learning-focused repository for understanding Transformer internals by implementing them from scratch.
- currently has a basic notebook level implementation. Probably better to stick to tiktoken
- Work on adding and handling special tokens for training and using llms
- implementation based on llama code, src: https://github.com/meta-llama/llama3/blob/main/llama/model.py
- trying to implement function code from paper, including gpt2/llama style llm(decoder only).
Working on finishing the model, then play around with training, evaluation and finetuning pipelines/scripts.
This repo prioritizes clarity over optimization. It is meant for exploration, experimentation, and mapping theory → code — not for production use.