GitHub - carlovalenti/TRiP: A complete transformer engine in C — inference, training, chat, vision.

TRiP — TRansformer in Progress

A few-files, all-in-one C engine for Transformer AI models: inference, training, tokenizer creation, chat, and vision.

Built from scratch over 18 months (from March 2024 to August 2025) during my lunch breaks and weekend nights, TRiP exists just because I wanted to truly understand the transformer internals - from the matrix multiplications up.

TRiP's purpose is purely educational, for me and for anyone willing to learn about transformers. It supports Gemma 1, Llama 2, PaliGemma, and GPT-2, with full inference and training. It does not aim to track the latest model releases, and is not trying to compete with llama.cpp.

NOTE: since people are asking: here's what's AI-generated in the code:

the json parser (with some fix)

the safetensors checkpoint save function

the whole jpeg-X11 management functions (I had no interest in developing them)

the final file split (I initially wrote everything as main.c :D )

some revisions of the comments before I made the commit

this readme, for the most part :D

That's all, I think; the rest, it's all hand-coded by me. And it would had no sense otherwise, since the whole point in doing this was: to achieve the closest to a full-stack understanding of the transformer internals.

What TRiP supports

Architectures: Llama2, Gemma 1.0/1.1, PaliGemma 1 (vision+language), GPT-2

Llama2, Gemma 1.0/1.1, PaliGemma 1 (vision+language), GPT-2 Checkpoint formats: SafeTensors (HuggingFace), Karpathy's llama2.c and gpt2 formats

SafeTensors (HuggingFace), Karpathy's llama2.c and gpt2 formats Weight types: bf16, float16, float32

bf16, float16, float32 Training: full backpropagation with AdamW, cosine annealing LR, gradient clipping

full backpropagation with AdamW, cosine annealing LR, gradient clipping Tokenizer: BPE (SentencePiece-compatible), with vocabulary creation from scratch

BPE (SentencePiece-compatible), with vocabulary creation from scratch Inference: greedy, top-k, and nucleus (top-p) sampling

greedy, top-k, and nucleus (top-p) sampling Chat: interactive chat with Llama, Gemma, and TinyLlama chat templates

interactive chat with Llama, Gemma, and TinyLlama chat templates Vision: multimodal inference with PaliGemma (JPEG input, X11 display)

multimodal inference with PaliGemma (JPEG input, X11 display) Memory: RAM-optimized mode via mmap for large models on limited hardware

Building

Dependencies

gcc (recommended: version 13 or higher, to get support for bfloat16; with OpenMP support) libjpeg-dev (or libjpeg62-turbo-dev) libx11-dev

WARNING: do NOT expect higher performance with bfloat16 or float16 on CPUs; today's CPUs are not optimized for floating point operations in such formats, and float32 always performs best. That surprised me a lot, too.

On Debian:

sudo apt install build-essential libomp-dev libjpeg62-turbo-dev libx11-dev

On Ubuntu:

sudo apt install build-essential libomp-dev libjpeg-dev libx11-dev

Windows (WSL)

TRiP runs natively under WSL (Windows Subsystem for Linux). To enable the X11 display features (vision mode, image display), install an X server on the Windows side:

VcXsrv (free, lightweight)

Xming (free version available)

MobaXterm (has a built-in X server)

Then in your WSL terminal, before running TRiP:

export DISPLAY=:0

If using WSL2 (most setups), use instead:

export DISPLAY= $( cat /etc/resolv.conf | grep nameserver | awk ' {print $2} ' ) :0

X11 is only needed for vision mode. Chat, inference, and training work without it.

Compile

make

That's it. No cmake, no external frameworks, no Python. Just make .

Quick start

Chat with a Gemma model

Download a Gemma-2B-IT model from HuggingFace (safetensors format), then:

./trip --chat \ --checkpoint gemma-2b-it/model.safetensors \ --tokenizer gemma-2b-it/tokenizer.json \ --chat_scheme GEMMA

Run inference on a prompt

./trip --decode \ --input_text " The capital of Italy is " \ --checkpoint gemma-2b-it/model.safetensors \ --tokenizer gemma-2b-it/tokenizer.json

Or from a text file:

./trip --decode prompt.txt \ --checkpoint gemma-2b-it/model.safetensors \ --tokenizer gemma-2b-it/tokenizer.json

Train a model

./trip --train \ --checkpoint my_model/model.safetensors \ --tokenizer my_model/tokenizer.json \ --train_data my_dataset.txt \ --train_config training_args.json

Vision (PaliGemma)

./trip --vision photo.jpg \ --checkpoint paligemma/model.safetensors \ --tokenizer paligemma/tokenizer.json \ --input_text " Describe this image "

Build a tokenizer vocabulary from scratch

./trip --build_vocab corpus.txt --vocab_size 32000 --tokenizer my_tokenizer.json

Full CLI reference

USAGE: ./trip <ACTION> [OPTIONS...]

Actions (pick one)

Flag Description --decode [file] Run inference on a prompt (from file, --input_text , or stdin) --chat Interactive chat session --vision [image.jpg] Multimodal inference with an image --train Train the model --create Create a new model from a configuration file --build_vocab <data.txt> Build a new tokenizer vocabulary from a text corpus --utest Run unit tests --help Show help

Model & tokenizer options

Flag Default Description --checkpoint <path> default.model Path to model checkpoint file(s) --checkpoint_type <type> SAFETENSORS Format: SAFETENSORS , LLAMA2_AK , GPT2_AK --configuration <path> (auto) Path to config.json (for SafeTensors) --tokenizer <path> default.tokenizer Path to tokenizer file --tokenizer_format <type> JSON_HUGGINGFACE Format: JSON_HUGGINGFACE , LLAMA2_AK , GPT2_AK --tokenizer_type <type> SENTENCEPIECE Algorithm: SENTENCEPIECE , TRIP

Inference & sampling options

Flag Default Description --input_text "<prompt>" — Provide prompt text directly on the command line --system_prompt "<text>" — System prompt for chat mode --chat_scheme <scheme> (none) Chat template: LLAMA , TINY_LLAMA , GEMMA --chat_save_context <file> — Pre-process and save chat context for faster startup --chat_load_context <file> — Load a previously saved chat context --temperature <value> 1.0 Sampling temperature. 0.0 = greedy (always pick the most probable token) --top_p <value> 0.9 Nucleus sampling: sample from the smallest set of tokens whose cumulative probability exceeds this value --top_k <value> (disabled) Top-k sampling: sample from the k most probable tokens --ram (off) Memory-map weights instead of loading them (slower, uses less RAM)

Training options

Flag Default Description --train_config <path> training_args.json Path to training configuration JSON --train_data <path> training_data.txt Path to training data (plain text)

File map

TRiP is organized into 7 files. Open trip.h for the complete map.

File Lines What it contains trip.h ~900 The map. Every type, struct, global, and declaration. math.c ~3000 Tensor ops, each forward+backward paired side by side: matmul, softmax, layernorm, RMSnorm, RoPE, attention, FFN activations, vector arithmetic forward.c ~1500 Forward pass orchestration + token sampling backward.c ~1500 Backward pass + AdamW optimizer + gradient management model.c ~5500 Checkpoint I/O, model init, memory management, tokenizer, vision preprocessing utils.c ~1000 Logging, JSON parser, terminal I/O, JPEG/X11 image handling main.c ~1900 CLI argument parsing, chat loop, training loop, inference loop

How it works (for the curious)

TRiP implements a transformer from first principles in C. No PyTorch, no TensorFlow, no ONNX — just linear algebra on arrays of floats.

The residual stream is the central concept: a vector that flows through the model like data on a bus. Each layer reads from it, processes it through attention and a feed-forward network, and writes back to it. The forward pass walks the layers top to bottom; the backward pass walks them bottom to top, computing gradients via the chain rule.

Every math operation ( math.c ) is implemented as a forward+backward pair: you can read rmsnorm() and immediately below it rmsnorm_backward() , and see exactly how the gradient flows through the same computation in reverse.

I put a lot of comments in the code, both as reminders to me, and to render TRiP basically an annotated school book about transformers.

For a deeper understanding of backpropagation, see Andrej Karpathy's lecture; TRiP would never have existed without his work.

License

CC BY-NC 4.0 — free to use, study, modify, and share for non-commercial purposes, with attribution. For commercial licensing, contact the author.

Acknowledgments