TRiP — TRansformer in Progress
A few-files, all-in-one C engine for Transformer AI models: inference, training, tokenizer creation, chat, and vision.
Built from scratch over 18 months (from March 2024 to August 2025) during my lunch breaks and weekend nights, TRiP exists just because I wanted to truly understand the transformer internals - from the matrix multiplications up.
TRiP's purpose is purely educational, for me and for anyone willing to learn about transformers. It supports Gemma 1, Llama 2, PaliGemma, and GPT-2, with full inference and training. It does not aim to track the latest model releases, and is not trying to compete with llama.cpp.
NOTE: since people are asking: here's what's AI-generated in the code:
the json parser (with some fix)
the safetensors checkpoint save function
the whole jpeg-X11 management functions (I had no interest in developing them)
the final file split (I initially wrote everything as main.c :D )
some revisions of the comments before I made the commit
this readme, for the most part :D
That's all, I think; the rest, it's all hand-coded by me. And it would had no sense otherwise, since the whole point in doing this was: to achieve the closest to a full-stack understanding of the transformer internals.
What TRiP supports
Architectures: Llama2, Gemma 1.0/1.1, PaliGemma 1 (vision+language), GPT-2
Llama2, Gemma 1.0/1.1, PaliGemma 1 (vision+language), GPT-2 Checkpoint formats: SafeTensors (HuggingFace), Karpathy's llama2.c and gpt2 formats
SafeTensors (HuggingFace), Karpathy's llama2.c and gpt2 formats Weight types: bf16, float16, float32
bf16, float16, float32 Training: full backpropagation with AdamW, cosine annealing LR, gradient clipping
full backpropagation with AdamW, cosine annealing LR, gradient clipping Tokenizer: BPE (SentencePiece-compatible), with vocabulary creation from scratch
BPE (SentencePiece-compatible), with vocabulary creation from scratch Inference: greedy, top-k, and nucleus (top-p) sampling
greedy, top-k, and nucleus (top-p) sampling Chat: interactive chat with Llama, Gemma, and TinyLlama chat templates
interactive chat with Llama, Gemma, and TinyLlama chat templates Vision: multimodal inference with PaliGemma (JPEG input, X11 display)
multimodal inference with PaliGemma (JPEG input, X11 display) Memory: RAM-optimized mode via mmap for large models on limited hardware
Building
Dependencies
gcc (recommended: version 13 or higher, to get support for bfloat16; with OpenMP support) libjpeg-dev (or libjpeg62-turbo-dev) libx11-dev
WARNING: do NOT expect higher performance with bfloat16 or float16 on CPUs; today's CPUs are not optimized for floating point operations in such formats, and float32 always performs best. That surprised me a lot, too.
On Debian:
sudo apt install build-essential libomp-dev libjpeg62-turbo-dev libx11-dev
On Ubuntu:
sudo apt install build-essential libomp-dev libjpeg-dev libx11-dev
Windows (WSL)
TRiP runs natively under WSL (Windows Subsystem for Linux). To enable the X11 display features (vision mode, image display), install an X server on the Windows side:
VcXsrv (free, lightweight)
Xming (free version available)
MobaXterm (has a built-in X server)
Then in your WSL terminal, before running TRiP:
export DISPLAY=:0
If using WSL2 (most setups), use instead:
export DISPLAY= $( cat /etc/resolv.conf | grep nameserver | awk ' {print $2} ' ) :0
X11 is only needed for vision mode. Chat, inference, and training work without it.
Compile
make
That's it. No cmake, no external frameworks, no Python. Just make .
Quick start
Chat with a Gemma model
Download a Gemma-2B-IT model from HuggingFace (safetensors format), then:
./trip --chat \ --checkpoint gemma-2b-it/model.safetensors \ --tokenizer gemma-2b-it/tokenizer.json \ --chat_scheme GEMMA
Run inference on a prompt
./trip --decode \ --input_text " The capital of Italy is " \ --checkpoint gemma-2b-it/model.safetensors \ --tokenizer gemma-2b-it/tokenizer.json
Or from a text file:
./trip --decode prompt.txt \ --checkpoint gemma-2b-it/model.safetensors \ --tokenizer gemma-2b-it/tokenizer.json
Train a model
./trip --train \ --checkpoint my_model/model.safetensors \ --tokenizer my_model/tokenizer.json \ --train_data my_dataset.txt \ --train_config training_args.json
Vision (PaliGemma)
./trip --vision photo.jpg \ --checkpoint paligemma/model.safetensors \ --tokenizer paligemma/tokenizer.json \ --input_text " Describe this image "
Build a tokenizer vocabulary from scratch
./trip --build_vocab corpus.txt --vocab_size 32000 --tokenizer my_tokenizer.json
Full CLI reference
USAGE: ./trip <ACTION> [OPTIONS...]
Actions (pick one)
Flag Description --decode [file] Run inference on a prompt (from file, --input_text , or stdin) --chat Interactive chat session --vision [image.jpg] Multimodal inference with an image --train Train the model --create Create a new model from a configuration file --build_vocab <data.txt> Build a new tokenizer vocabulary from a text corpus --utest Run unit tests --help Show help
Model & tokenizer options
Flag Default Description --checkpoint <path> default.model Path to model checkpoint file(s) --checkpoint_type <type> SAFETENSORS Format: SAFETENSORS , LLAMA2_AK , GPT2_AK --configuration <path> (auto) Path to config.json (for SafeTensors) --tokenizer <path> default.tokenizer Path to tokenizer file --tokenizer_format <type> JSON_HUGGINGFACE Format: JSON_HUGGINGFACE , LLAMA2_AK , GPT2_AK --tokenizer_type <type> SENTENCEPIECE Algorithm: SENTENCEPIECE , TRIP
Inference & sampling options
Flag Default Description --input_text "<prompt>" — Provide prompt text directly on the command line --system_prompt "<text>" — System prompt for chat mode --chat_scheme <scheme> (none) Chat template: LLAMA , TINY_LLAMA , GEMMA --chat_save_context <file> — Pre-process and save chat context for faster startup --chat_load_context <file> — Load a previously saved chat context --temperature <value> 1.0 Sampling temperature. 0.0 = greedy (always pick the most probable token) --top_p <value> 0.9 Nucleus sampling: sample from the smallest set of tokens whose cumulative probability exceeds this value --top_k <value> (disabled) Top-k sampling: sample from the k most probable tokens --ram (off) Memory-map weights instead of loading them (slower, uses less RAM)
Training options
Flag Default Description --train_config <path> training_args.json Path to training configuration JSON --train_data <path> training_data.txt Path to training data (plain text)
File map
TRiP is organized into 7 files. Open trip.h for the complete map.
File Lines What it contains trip.h ~900 The map. Every type, struct, global, and declaration. math.c ~3000 Tensor ops, each forward+backward paired side by side: matmul, softmax, layernorm, RMSnorm, RoPE, attention, FFN activations, vector arithmetic forward.c ~1500 Forward pass orchestration + token sampling backward.c ~1500 Backward pass + AdamW optimizer + gradient management model.c ~5500 Checkpoint I/O, model init, memory management, tokenizer, vision preprocessing utils.c ~1000 Logging, JSON parser, terminal I/O, JPEG/X11 image handling main.c ~1900 CLI argument parsing, chat loop, training loop, inference loop
How it works (for the curious)
TRiP implements a transformer from first principles in C. No PyTorch, no TensorFlow, no ONNX — just linear algebra on arrays of floats.
The residual stream is the central concept: a vector that flows through the model like data on a bus. Each layer reads from it, processes it through attention and a feed-forward network, and writes back to it. The forward pass walks the layers top to bottom; the backward pass walks them bottom to top, computing gradients via the chain rule.
Every math operation ( math.c ) is implemented as a forward+backward pair: you can read rmsnorm() and immediately below it rmsnorm_backward() , and see exactly how the gradient flows through the same computation in reverse.
I put a lot of comments in the code, both as reminders to me, and to render TRiP basically an annotated school book about transformers.
For a deeper understanding of backpropagation, see Andrej Karpathy's lecture; TRiP would never have existed without his work.
License
CC BY-NC 4.0 — free to use, study, modify, and share for non-commercial purposes, with attribution. For commercial licensing, contact the author.
Acknowledgments