This article covers Introduction to Graphics Programming with Vulkan. A Comprehensive Guide from First Principles to a Complete 3D Renderer
"Vulkan is not a graphics API. It is a contract between you and the GPU — a raw, unforgiving protocol that rewards discipline and punishes assumptions."
Table of Contents
Chapter 1: Foundations of Graphics Programming
1.1 What Is Computer Graphics?
Computer graphics is the discipline of generating, manipulating, and displaying visual content using computers. The field encompasses a vast range of sub-disciplines: 2D rasterization, 3D rendering, image processing, computational geometry, physically-based simulation, and real-time rendering for interactive applications. When we talk about "graphics programming" in the context of this guide, we primarily mean real-time 3D rendering — the process of taking a mathematical description of a three-dimensional scene and converting it into a two-dimensional image on screen, typically at 60 or more frames per second.
This is an extraordinary feat of engineering. A modern game running at 4K resolution must fill 8,294,400 pixels, sixty times per second. Each pixel might require sampling dozens of textures, evaluating complex lighting equations, and blending transparent surfaces. The mathematics involved spans linear algebra, calculus, signal processing, and numerical analysis. The engineering spans hardware design, compiler construction, operating system interaction, and high-performance software architecture. Graphics programming sits at the intersection of all of these.
Yet the fundamental pipeline — the conceptual flow from mathematical scene description to pixels on screen — has remained remarkably stable since the 1980s. Understanding this pipeline is the foundation upon which everything else in this guide is built.
1.2 The Rendering Pipeline: A Conceptual Overview
The rendering pipeline describes the sequence of transformations and operations that take raw geometric data and produce a final image. Before we discuss Vulkan specifically, we need to understand this pipeline in the abstract.
At the highest level, the process looks like this:
[ ] | v [ ] <- Transform vertices from 3 D space to screen space | v [ ] <- Convert geometric primitives into discrete pixels | v [ ] <- Determine the color of each pixel | v [ ] <- Combine fragments with existing framebuffer data | v [ ]
Let's examine each stage.
The Geometry Stage
Three-dimensional geometry is represented as a collection of vertices — points in space with associated attributes like position, normal direction, texture coordinates, and color. These vertices are grouped into primitives — usually triangles, but also lines and points.
The geometry stage transforms vertices from their original coordinate space (called object space or model space) through a series of transformations into clip space, and ultimately into normalized device coordinates (NDC), which map onto the screen.
This transformation chain is the famous MVP transform:
Object Space | [ Model Matrix ] v World Space | [ View Matrix ] v View / Camera Space | [ Projection Matrix ] v Clip Space | [ Perspective Division ] v NDC Space ( - 1 to + 1 in X , Y , Z ) | [ Viewport Transform ] v Screen Space ( pixels )
Each of these transformations is a 4×4 matrix multiplication, and the entire chain is often concatenated into a single MVP matrix that can be applied to each vertex in one operation.
The Model matrix transforms a mesh from its own local coordinate system into world space. If you have a cube model centered at the origin, the model matrix might translate it to position (5, 0, 3) in the world.
The View matrix transforms world space into camera space. It essentially moves everything in the world such that the camera is at the origin looking down the negative Z axis. If the camera moves right, the View matrix moves the entire world left.
The Projection matrix performs the perspective transformation — objects farther from the camera appear smaller. There are two common projection types:
Perspective projection : Mimics how human eyes and cameras work. Parallel lines converge at infinity. Defined by a field of view angle, an aspect ratio, and near/far clip planes.
: Mimics how human eyes and cameras work. Parallel lines converge at infinity. Defined by a field of view angle, an aspect ratio, and near/far clip planes. Orthographic projection: Preserves parallel lines. Often used for 2D interfaces or CAD applications.
The Rasterization Stage
After geometry is transformed to screen space, the rasterizer converts vector geometry (triangles defined by their vertices) into fragments — candidate pixels. The rasterizer determines which pixels on screen are covered by each triangle, and for each covered pixel, it interpolates the vertex attributes across the triangle's surface.
This interpolation is what makes textures and colors smoothly vary across a surface. If a triangle has red at one corner, green at another, and blue at a third, the rasterizer will generate fragments with smoothly blended colors across the interior, computing the appropriate blend for each pixel based on its barycentric coordinates within the triangle.
The Fragment Stage
Each fragment produced by the rasterizer is processed by the fragment shader (or pixel shader in DirectX terminology). This is where the final color of each pixel is determined. The fragment shader has access to:
Interpolated vertex attributes (texture coordinates, normals, colors)
Texture samples
Uniform data (lights, material properties, time)
The fragment's position in screen space
A simple fragment shader might just return a solid color. A physically-based rendering (PBR) fragment shader might evaluate the Cook-Torrance BRDF, sample environment maps, compute shadow factors, and evaluate multiple light sources.
The Output Merger
The final stage combines the fragment output with the existing contents of the framebuffer. This is where operations like depth testing (discarding fragments that are occluded by already-rendered geometry), stencil testing (masking regions for special effects), and alpha blending (transparency) occur.
The depth buffer (or Z-buffer) stores the depth value of the closest rendered fragment at each pixel. When a new fragment arrives, its depth is compared to the stored value. If the new fragment is closer, it wins and updates both the color buffer and the depth buffer. If it's farther, it's discarded.
1.3 The History of Graphics APIs
Understanding why Vulkan exists requires understanding the history of graphics APIs and why developers were dissatisfied with what came before.
Early Direct Hardware Access (1970s-1980s)
In the early days of computer graphics, there was no abstraction layer. Programmers wrote directly to video memory, setting individual pixel values in a framebuffer. Programs had complete control and ran with maximum efficiency, but writing portable code was impossible — every graphics card had its own programming model.
Vendor-Specific APIs and Early Standardization (1980s-1990s)
As graphics hardware became more sophisticated and more varied, the need for standardized interfaces became clear. The early 1990s saw the rise of several competing standards:
IRIS GL from Silicon Graphics, which evolved into OpenGL
from Silicon Graphics, which evolved into OpenGL Direct3D from Microsoft, part of the DirectX suite
from Microsoft, part of the DirectX suite Various vendor-specific extensions and proprietary APIs
OpenGL, standardized by the Khronos Group, became the dominant cross-platform 3D graphics API. Its design philosophy was one of abstraction and ease of use. The programmer tells OpenGL what to draw and how it should look, and the driver figures out how to make the hardware do it.
The Fixed-Function Pipeline Era
Early OpenGL (and early Direct3D) used what is called a fixed-function pipeline. The transformations, lighting calculations, and texture operations were hardwired into the hardware and API. You configured the pipeline by setting state variables — "use this lighting model," "apply this texture in this way" — but you couldn't change the fundamental algorithm.
This was convenient for beginners and adequate for many applications, but it was inflexible. If you wanted a lighting effect not supported by the fixed-function pipeline, you were out of luck.
The Programmable Pipeline Revolution (Early 2000s)
The introduction of vertex shaders (with DirectX 8 in 2000) and then fragment shaders (with DirectX 9 in 2002) was revolutionary. Instead of fixed-function lighting and transformation, the programmer could write small programs — shaders — that executed on the GPU for each vertex and each fragment. This unlocked an explosion of visual effects previously impossible in real-time.
OpenGL followed with GLSL (OpenGL Shading Language) and the corresponding extensions, later standardized in OpenGL 2.0.
The OpenGL 1.x / 2.x State Machine Model
OpenGL uses a state machine model. The global state of OpenGL — which textures are bound, what color is active, which shader program is in use, what blend mode is enabled — is modified by a series of function calls. To draw something, you set up state and then call draw commands.
This model has significant problems for modern hardware:
Hidden complexity and driver magic: The GPU driver must track all state changes, detect dependencies, optimize work batching, and compile shaders on demand. Much of this work happens invisibly, causing unpredictable performance and "hiccups."
Implicit synchronization: When you call glDraw* , OpenGL guarantees the result is correct, even if it requires stalling the GPU pipeline to ensure previous operations have completed. This implicit synchronization is safe but expensive.
Single-threaded by design: OpenGL was designed for a single thread to talk to a single GPU. Modern applications want to build command lists on multiple CPU threads simultaneously, then submit them all to the GPU.
Mutable global state: Debugging OpenGL code is notoriously difficult because any part of the codebase might have modified global state. The same draw call can produce different results depending on what happened before it.
OpenGL 4.x and the Attempt to Modernize
OpenGL 4.x (2010-2017) introduced features to address some of these problems: persistent mapped buffers, direct state access (DSA), compute shaders, SPIR-V support. But the API was burdened by two decades of legacy decisions. Supporting old code meant new features had to be grafted onto an increasingly awkward foundation.
Mantle: The Harbinger of Change
In 2013, AMD announced Mantle — a low-level GPU API designed to give developers direct access to GCN hardware with minimal driver overhead. Mantle demonstrated that moving synchronization, memory management, and pipeline state explicitly to the developer could yield dramatic performance improvements, particularly in CPU-limited scenarios.
Mantle was only for AMD hardware, but its influence was enormous. Both Microsoft and Khronos used it as inspiration.
Direct3D 12 and Metal
Direct3D 12 (2015) was Microsoft's response — a completely redesigned API that exposed GPU hardware much more directly, requiring explicit memory management, explicit synchronization, and explicit resource tracking. Metal (2014) was Apple's equivalent for their hardware ecosystem.
Both demonstrated that modern graphics applications could benefit enormously from explicit control, even at the cost of increased developer complexity.
Vulkan: The Cross-Platform Low-Level API
Vulkan (2016) was the Khronos Group's response — essentially a cross-platform version of Mantle's philosophy, incorporating lessons from Direct3D 12 and Metal. Developed with input from GPU vendors (AMD, NVIDIA, ARM, Intel, Qualcomm, Imagination), operating system vendors, game developers, and academia, Vulkan represents the state of the art in cross-platform explicit graphics APIs.
Vulkan runs on Windows, Linux, Android, macOS (via MoltenVK), and iOS. It supports desktop GPUs, mobile GPUs, and even some compute accelerators. It is the foundation for the next generation of graphics applications.
1.4 Coordinate Systems and Linear Algebra Review
Before diving into Vulkan, let's ensure we have a solid understanding of the mathematics we'll be using throughout this guide.
Vectors
A vector is an ordered tuple of numbers representing a direction and magnitude in space. In 3D graphics, we primarily use 3-component vectors (x, y, z) for positions and directions, and 4-component vectors (x, y, z, w) for homogeneous coordinates.
Key vector operations:
Addition: (a₁, a₂, a₃) + (b₁, b₂, b₃) = (a₁+b₁, a₂+b₂, a₃+b₃)
Scalar multiplication: s × (a₁, a₂, a₃) = (s×a₁, s×a₂, s×a₃)
Dot product: a · b = a₁b₁ + a₂b₂ + a₃b₃ = |a||b|cos(θ) The dot product is used for computing angles between vectors, projections, and in lighting calculations.
Cross product: a × b = (a₂b₃ - a₃b₂, a₃b₁ - a₁b₃, a₁b₂ - a₂b₁) The cross product produces a vector perpendicular to both inputs, with magnitude |a||b|sin(θ). Used for computing surface normals.
Normalization: n̂ = v / |v|, where |v| = √(v·v) A normalized vector has magnitude 1 and represents a pure direction.
Matrices
A matrix is a rectangular array of numbers. In 3D graphics, we primarily use 4×4 matrices to represent transformations in homogeneous coordinates.
[m00 m01 m02 m03] [m10 m11 m12 m13] [m20 m21 m22 m23] [m30 m31 m32 m33]
Matrix-vector multiplication transforms a vector:
=
Matrix-matrix multiplication composes transformations.
Homogeneous Coordinates
In homogeneous coordinates, 3D positions are represented as 4-component vectors where the fourth component w = 1, while directions have w = 0. This elegant convention allows both translation and linear transformations to be represented as matrix multiplications:
Translation matrix (translate by (tx, ty, tz)):
[1 0 0 tx] [0 1 0 ty] [0 0 1 tz] [0 0 0 1 ]
Scale matrix (scale by (sx, sy, sz)):
[sx 0 0 0] [0 sy 0 0] [0 0 sz 0] [0 0 0 1]
Rotation matrix (rotate by θ around X axis):
[ 1 0 0 0 ] [ 0 cos( θ ) -sin( θ ) 0 ] [ 0 sin( θ ) cos( θ ) 0 ] [ 0 0 0 1 ]
After the projection matrix, the homogeneous w component becomes non-trivial (it encodes the depth). The perspective divide — dividing all components by w — produces normalized device coordinates.
Quaternions
For rotations, matrices have the problem of gimbal lock — when two rotation axes align, a degree of freedom is lost. Quaternions are an alternative representation: a 4-component vector (x, y, z, w) that encodes an axis and angle of rotation without gimbal lock issues. They're more compact than matrices, easier to interpolate (using SLERP), and numerically more stable. Most game engines and 3D applications use quaternions for rotation storage, converting to matrices when necessary for rendering.
1.5 Color Models and Gamma Correction
Colors in computer graphics are typically represented as tuples of numbers. The most common model for displays is RGB — Red, Green, Blue — where each component ranges from 0 to 1 (or 0 to 255 in integer form). A color (R, G, B) = (1, 0, 0) is pure red. (0.5, 0.5, 0.5) is 50% grey.
For rendering with transparency, we add an alpha channel: RGBA. Alpha represents opacity, where 0 is fully transparent and 1 is fully opaque.
Gamma and Linear Color Spaces
Human perception of brightness is nonlinear — we can distinguish small differences in dark tones more easily than small differences in bright tones. Historically, CRT monitors encoded brightness nonlinearly (gamma ≈ 2.2) to exploit this, allowing more efficient use of limited bit depth.
This creates a critical issue for graphics programmers: textures are typically stored in gamma-encoded (sRGB) space for human perception, but lighting calculations must be done in linear space to be physically correct.
The correct workflow is:
Convert sRGB textures to linear when sampling (raise to power 2.2) Perform all lighting in linear space Convert output from linear to sRGB before display (raise to power 1/2.2)
If you skip this conversion, your lighting will look "washed out" and incorrect. Many graphics engines handle this transparently through sRGB framebuffer support — you declare your framebuffer as sRGB, and the hardware automatically linearizes on read and gamma-encodes on write.
Vulkan exposes this through the VK_FORMAT_B8G8R8A8_SRGB vs VK_FORMAT_B8G8R8A8_UNORM format distinction.
1.6 Rasterization vs Ray Tracing
It's worth briefly contrasting the rasterization approach (which Vulkan primarily targets, though it supports ray tracing too) with ray tracing.
Rasterization: For each triangle in the scene, determine which pixels it covers, then compute colors. The inner loop is over geometry. This is extremely efficient for dense geometry with many primitives, and maps beautifully to parallel GPU hardware.
Ray Tracing: For each pixel, cast a ray from the camera and determine what it hits. For each hit, cast more rays for reflections, shadows, ambient occlusion. The inner loop is over pixels. This naturally produces correct shadows, reflections, and global illumination, but historically required orders of magnitude more computation than rasterization.
Modern hardware (NVIDIA RTX series, AMD RDNA2, Intel Xe) has dedicated ray tracing accelerators, and Vulkan 1.2 added the VK_KHR_ray_tracing_pipeline extension to expose these capabilities. This guide focuses on rasterization, which remains the dominant technique for real-time rendering, but the architectural principles transfer directly to ray tracing.
Chapter 2: Understanding Modern GPU Architecture
To write efficient Vulkan code, you need a mental model of what's happening inside the GPU. You don't need to know every detail of a specific architecture, but understanding the broad strokes will inform every API decision you make.
2.1 The GPU as a Massively Parallel Processor
A modern discrete GPU contains thousands of compute units, each capable of processing multiple threads simultaneously. A high-end GPU might have 10,000+ shader cores, capable of executing 10,000+ operations in a single clock cycle.
This massive parallelism is the GPU's superpower, but it comes with constraints that shape how you must program it.
Latency vs Throughput
A CPU is optimized for latency — executing a single thread of instructions as fast as possible. It has large, sophisticated out-of-order execution engines, branch predictors, and large caches (often 32MB+ L3) that try to ensure each instruction starts executing with minimal delay.
A GPU is optimized for throughput — executing as many threads as possible over time. It has simpler execution units, smaller caches, but many more of them. When one batch of threads stalls waiting for a memory access, the GPU switches to another batch of threads instantly (zero cost context switch). This latency hiding through massive thread parallelism is the core insight of GPU architecture.
Implications for the programmer:
Avoid divergence : All threads in a group (a "warp" on NVIDIA, a "wavefront" on AMD) execute the same instructions simultaneously. If threads take different branches ( if/else ), both branches execute and inactive threads do nothing. Keep shader code divergence minimal.
: All threads in a group (a "warp" on NVIDIA, a "wavefront" on AMD) execute the same instructions simultaneously. If threads take different branches ( ), both branches execute and inactive threads do nothing. Keep shader code divergence minimal. Memory access patterns matter : Sequential, predictable memory access allows the hardware to coalesce multiple accesses into fewer, wider operations. Random access patterns are expensive.
: Sequential, predictable memory access allows the hardware to coalesce multiple accesses into fewer, wider operations. Random access patterns are expensive. Use GPU memory: Transferring data between CPU (system RAM) and GPU (VRAM) over the PCIe bus is slow. Keep frequently accessed data in VRAM.
2.2 The GPU Memory Hierarchy
Modern GPUs have multiple levels of memory with different performance characteristics:
Register file: Fastest storage, private to each shader invocation. Limited in size — running out of registers forces the GPU to use slower memory and reduces parallelism (occupancy).
Shared memory / LDS (Local Data Store): Shared within a compute unit (workgroup). Programmer-controlled, very fast. Used in compute shaders for inter-thread communication within a group. ~64KB per compute unit on typical hardware.
L1 cache: ~16-128KB per compute unit. Cached, hardware-managed.
L2 cache: Shared across the entire GPU. ~2-8MB on modern GPUs.
VRAM (Video RAM): High-bandwidth GPU memory. GDDR6 at 500-1000 GB/s, or HBM2/HBM3 at 1000-3500 GB/s. But with ~100ns latency.
System RAM (host memory): Accessible to both CPU and GPU via PCIe or unified memory architectures. 50-100 GB/s bandwidth. Higher latency.
Vulkan exposes the memory hierarchy explicitly through memory types and memory heaps. The programmer must choose where to allocate each buffer and image, balancing access patterns, required features, and bandwidth needs.
2.3 Shader Execution Model
Shaders execute in fixed-size groups of threads. NVIDIA calls these groups warps (32 threads); AMD calls them wavefronts (64 threads on older GCN, 32 or 64 on RDNA). All threads in a warp/wavefront execute in lockstep — the same instruction at the same time on different data (SIMD execution).
This SIMD model has important consequences:
Register pressure: Each thread in a warp needs its own registers. More registers per thread means fewer threads can be active simultaneously, reducing the GPU's ability to hide latency.
Occupancy: The ratio of active warps to maximum possible warps. Higher occupancy generally means better latency hiding, but there's a sweet spot — sometimes fewer, larger warps with more work per thread outperform high-occupancy solutions.
Memory coalescing: When threads in a warp access consecutive memory addresses, the hardware can service all accesses in a single memory transaction. Non-coalesced accesses (random, strided) require multiple transactions and kill performance.
2.4 The Rendering Pipeline in Hardware
The conceptual pipeline described in Chapter 1 maps to physical hardware stages in the GPU:
Command Processor: Reads commands from command buffers submitted by the CPU. Manages dispatches to other hardware units.
Geometry Engine: Executes vertex shaders. Handles primitive assembly, tessellation, and geometry shaders. Outputs transformed triangles.
Rasterizer: Fixed-function hardware. Takes transformed triangles and generates fragments. Performs hierarchical Z culling to quickly discard covered regions.
Fragment/Pixel Shaders: Executes fragment shaders for each fragment. Access to textures and other resources.
ROP (Raster Operations Pipeline): Fixed-function hardware. Performs depth/stencil testing, alpha blending, and writes to the framebuffer. Can be a bottleneck in heavily blended scenes.
Texture Units: Specialized hardware for sampling textures. Supports bilinear, trilinear, and anisotropic filtering in hardware. Caches working sets to exploit temporal and spatial locality.
2.5 Synchronization and Hazards
When the GPU executes multiple draw calls, or when both the CPU and GPU access the same data, hazards can arise — situations where the wrong data is read because an operation hasn't completed yet.
Read-After-Write (RAW): Thread B reads data that Thread A is writing. If B executes before A finishes, B reads stale data.
Write-After-Read (WAR): Thread B writes data that Thread A reads. If B executes before A finishes reading, A may read partially updated data.
Write-After-Write (WAW): Two threads write to the same location. The final value depends on execution order.
In OpenGL, the driver handles all of this implicitly. You call glDraw* three times, and the results are always as if they executed in order, even if this requires stalling the pipeline.
In Vulkan, you are responsible for declaring dependencies between operations using pipeline barriers, semaphores, and fences. This is more work, but it means you can express exactly the synchronization you need — and no more. Unnecessary synchronization is expensive.
2.6 Tiled Architecture and Mobile GPUs
Mobile GPUs (ARM Mali, Qualcomm Adreno, Apple A-series, Imagination PowerVR) commonly use a tile-based deferred rendering (TBDR) architecture rather than the immediate-mode rendering of desktop GPUs.
In TBDR:
Geometry pass: All vertex shaders run, and the GPU bins triangles into screen-space tiles (typically 32×32 pixels). Rasterization pass: One tile at a time, the GPU rasterizes all triangles that overlap the tile, performing depth testing with a small, on-chip depth buffer before running fragment shaders.
This architecture dramatically reduces bandwidth — the framebuffer and depth buffer for a tile fit entirely in on-chip SRAM, so reading and writing them doesn't touch main memory until the entire tile is done. On memory-bandwidth-constrained mobile devices, this is a huge win.
Vulkan accounts for tiled architectures through features like:
VK_ATTACHMENT_LOAD_OP_DONT_CARE and VK_ATTACHMENT_STORE_OP_DONT_CARE : Let the driver skip loading/storing tile data from/to main memory when it's not needed.
and : Let the driver skip loading/storing tile data from/to main memory when it's not needed. Subpasses and input attachments : Allow later passes to read results from earlier passes within the same tile without going to main memory.
: Allow later passes to read results from earlier passes within the same tile without going to main memory. Lazy memory allocation: Mobile memory types that are never actually backed by main memory if data can stay on-chip.
Understanding tiled architectures matters even if you're targeting desktop, because correctly handling render passes is important for portability.
Chapter 3: Why Vulkan? The Philosophy of Explicit APIs
3.1 The Problem with OpenGL
To truly appreciate what Vulkan offers, we need to understand the problems it solves. These aren't theoretical concerns — they caused real performance problems in real shipped games.
The Driver Complexity Problem
An OpenGL driver is extraordinarily complex. Consider what happens when you compile a GLSL shader:
The driver receives GLSL source code as a string at runtime. It must compile this to an intermediate representation. It must optimize the shader. It must compile it to the specific GPU's machine code. It may need to re-compile it if you change certain state later (because OpenGL allows state to affect shader behavior in complex ways).
This compilation can take hundreds of milliseconds — an eternity in a 16ms frame budget. OpenGL drivers deal with this through:
Shader compilation caching : Compile and cache shaders. But cache misses still cause hitches.
: Compile and cache shaders. But cache misses still cause hitches. Deferred compilation : Don't fully compile until the shader is first used. This causes first-frame hitches.
: Don't fully compile until the shader is first used. This causes first-frame hitches. Background compilation: Compile on a background thread. Risky; synchronization is complex.
None of these solutions are perfect, and all require the driver to make guesses about what the programmer intended.
The State Machine Hazard Problem
The OpenGL state machine means that rendering with a different shader, or with different textures, requires changing global state. The driver must:
Track which state changed between draw calls.
Determine which state changes affect shader correctness vs. just performance.
Possibly flush the GPU pipeline to avoid hazards.
Batch state changes optimally for the specific GPU.
This is an enormous amount of bookkeeping, and different drivers do it differently, leading to inconsistent performance across hardware.
The Multithreading Problem
OpenGL has one rendering context that can only be current on one thread at a time. Building commands for a single frame must happen sequentially on a single thread. As CPU core counts increased (modern CPUs have 16-32+ cores), this became a significant bottleneck.
Workarounds existed (multiple contexts, display lists) but were cumbersome and limited.
The Error Handling Problem
OpenGL reports errors through glGetError() , which you call after operations. It returns only the most recent error and clears the error state. Debugging multi-threaded or deferred OpenGL code was notoriously difficult.
3.2 Vulkan's Design Philosophy
Vulkan's design is guided by several principles:
Principle 1: Explicit over Implicit
Vulkan requires you to explicitly state your intentions. Want to use a buffer as a vertex buffer and then as a texture? You must explicitly transition its state. Want the GPU to wait for a previous operation before starting a new one? You must explicitly insert a barrier.
This feels like more work — and it is, initially. But it means:
No hidden synchronization stalls
No driver guessing about your intentions
Predictable, consistent performance
Driver overhead reduced by 10-100× compared to OpenGL
Principle 2: Application Controls Memory
In OpenGL, the driver allocates GPU memory for you. It decides when to upload textures, when to compact memory, when to evict unused resources. In Vulkan, you allocate memory, decide where to place resources, and manage the lifetime of all allocations.
This is complex but powerful. You can implement pool allocators, ring buffers, and other patterns tuned to your specific access patterns.
Principle 3: Threading First
Vulkan is designed from the ground up for multi-threaded usage. Command buffers can be built in parallel on any thread. You submit them to queues in any order. The API uses external synchronization (the application is responsible for not calling the same Vulkan function on the same object from multiple threads simultaneously) rather than internal synchronization (driver-maintained locks).
A well-written Vulkan application can keep all CPU cores busy building command buffers simultaneously.
Principle 4: Pre-compile and Pre-specify
Pipeline state that OpenGL tracked dynamically (which shader is active, what's the blend mode, what's the depth test function) is in Vulkan frozen into immutable pipeline state objects at creation time. Creating a pipeline object is expensive (it involves shader compilation), but it happens at load time, not at draw time. During rendering, switching pipelines is fast because all the hardware state is pre-computed.
This eliminates the major source of hitches in OpenGL: the driver compiling or reconfiguring state mid-frame.
Principle 5: Validation is Optional
In debug builds, you enable validation layers that check every API call for correctness and report errors with detailed messages. In release builds, you disable them and pay zero overhead for error checking.
This contrasts with OpenGL, which always does error checking.
3.3 The Cost of Explicit APIs
Vulkan's explicitness comes at a cost: it is much more verbose than OpenGL. A minimal "hello triangle" in OpenGL might be 100 lines of code. The same program in Vulkan is typically 700-1000 lines, because you must explicitly manage:
Instance and device creation
Surface creation and swap chain setup
Render passes
Pipeline state objects
Memory allocation and buffer creation
Descriptor sets and layouts
Command pool and command buffer management
Synchronization with semaphores and fences
This verbosity is not waste — every line serves a purpose. But it does mean Vulkan has a steeper initial learning curve, and it's not well-suited for quick prototyping.
When to use Vulkan:
Large applications where performance and predictability matter
Games, high-performance visualization, professional 3D tools
Applications targeting multiple platforms including mobile
When you need fine-grained control over GPU behavior
When to consider alternatives:
Small tools where development speed matters more than performance
Learning exercises (start with a simpler API if your goal is to learn concepts, not Vulkan itself)
Applications where a higher-level engine (Unity, Unreal) is appropriate
3.4 Vulkan vs Other Modern APIs
Vulkan vs Direct3D 12: Conceptually very similar. Both require explicit synchronization, explicit memory management, and pipeline state objects. D3D12 is Windows/Xbox only. Vulkan is cross-platform. D3D12 has slightly better tooling on Windows (PIX debugger). Vulkan has better hardware coverage (mobile, older AMD hardware). For a new cross-platform project, Vulkan is typically preferred.
Vulkan vs Metal: Metal is Apple's API, available only on Apple devices. Conceptually similar to Vulkan but with a higher-level memory model and simpler synchronization. If targeting Apple devices exclusively, Metal is the natural choice. MoltenVK translates Vulkan calls to Metal, enabling Vulkan code to run on Apple hardware.
Vulkan vs WebGPU: WebGPU is a new API for the web, designed to be a safer, more portable subset of modern GPU capabilities. It's inspired by Vulkan/D3D12/Metal but deliberately simpler. For web-based graphics, WebGPU (via WebAssembly + dawn or wgpu) is the modern choice.
Chapter 4: Setting Up Your Development Environment
4.1 What You'll Need
To follow this guide, you'll need:
A modern GPU with Vulkan support. Most GPUs from 2015 onwards support Vulkan: NVIDIA GeForce 600 series and newer (Vulkan 1.0+) AMD Radeon GCN architecture and newer (Vulkan 1.0+) Intel HD 500 and newer (Vulkan 1.0+) ARM Mali T700 and newer (mobile)
An up-to-date GPU driver with Vulkan support. Driver download sources: NVIDIA: nvidia.com/Download/index.aspx AMD: amd.com/en/support Intel: intel.com/content/www/us/en/download-center/home.html
Operating System : Windows 10/11, Linux (any modern distribution), or macOS (via MoltenVK)
C++ compiler : MSVC (Visual Studio 2019+), GCC 9+, or Clang 10+
Build system : CMake 3.15+
Vulkan SDK: From LunarG (lunarg.com/vulkan-sdk/)
4.2 Installing the Vulkan SDK
The Vulkan SDK from LunarG provides:
Vulkan headers (the API definitions)
Vulkan loader (links your application to the driver)
Validation layers (debug checking)
GLSL to SPIR-V compiler ( glslc from Google)
from Google) Vulkan debugging utilities
Sample code
Windows Installation
Download the SDK installer from lunarg.com/vulkan-sdk/ Run the installer and accept default options The installer sets VULKAN_SDK environment variable Verify installation by running vkconfig from the Start Menu
Linux Installation
Ubuntu/Debian:
wget -qO- https://packages.lunarg.com/lunarg-signing-key-pub.asc | sudo apt-key add - sudo wget -qO /etc/apt/sources.list.d/lunarg-vulkan-jammy.list \ https://packages.lunarg.com/vulkan/lunarg-vulkan-jammy.list sudo apt update sudo apt install vulkan-sdk vulkaninfo
Arch Linux:
sudo pacman -S vulkan-devel
Fedora/RHEL:
sudo dnf install vulkan-devel glslang
macOS Installation
brew install vulkan-headers vulkan-loader molten-vk glslang
MoltenVK is automatically included in the LunarG macOS SDK and translates Vulkan calls to Metal.
4.3 Setting Up the Project with CMake
We'll use CMake as our build system. Here's a basic CMakeLists.txt for a Vulkan project:
cmake_minimum_required (VERSION 3.15 ) project (VulkanRenderer VERSION 1.0 LANGUAGES CXX) set (CMAKE_CXX_STANDARD 17 ) set (CMAKE_CXX_STANDARD_REQUIRED ON ) find_package (Vulkan REQUIRED) find_package (glfw3 REQUIRED) find_package (glm REQUIRED) add_executable (VulkanRenderer src/main.cpp src/Application.cpp src/VulkanContext.cpp src/SwapChain.cpp src/Pipeline.cpp src/Renderer.cpp ) target_link_libraries (VulkanRenderer Vulkan::Vulkan glfw glm::glm ) target_include_directories (VulkanRenderer PRIVATE ${CMAKE_CURRENT_SOURCE_DIR} /src ${CMAKE_CURRENT_SOURCE_DIR} / include ) add_custom_target (Shaders COMMAND ${CMAKE_COMMAND} -E copy_directory ${CMAKE_CURRENT_SOURCE_DIR} /shaders ${CMAKE_CURRENT_BINARY_DIR} /shaders DEPENDS ${CMAKE_CURRENT_SOURCE_DIR} /shaders ) add_dependencies (VulkanRenderer Shaders) find_program (GLSLC glslc) if (GLSLC) file (GLOB GLSL_SOURCE_FILES "${CMAKE_CURRENT_SOURCE_DIR}/shaders/*.vert" "${CMAKE_CURRENT_SOURCE_DIR}/shaders/*.frag" "${CMAKE_CURRENT_SOURCE_DIR}/shaders/*.comp" ) foreach (GLSL ${GLSL_SOURCE_FILES} ) get_filename_component (FILE_NAME ${GLSL} NAME) set (SPIRV "${CMAKE_CURRENT_BINARY_DIR}/shaders/${FILE_NAME}.spv" ) add_custom_command ( OUTPUT ${SPIRV} COMMAND ${GLSLC} ${GLSL} -o ${SPIRV} DEPENDS ${GLSL} ) list (APPEND SPIRV_BINARY_FILES ${SPIRV} ) endforeach () add_custom_target (CompileShaders DEPENDS ${SPIRV_BINARY_FILES} ) add_dependencies (VulkanRenderer CompileShaders) endif ()
Installing Dependencies
Windows (using vcpkg):
vcpkg install vulkan glfw3 glm stb tinyobjloader
Ubuntu/Debian:
sudo apt install libglfw3-dev libglm-dev libstb-dev
macOS (Homebrew):
brew install glfw glm
4.4 Project Structure
We'll organize our project as follows:
VulkanRenderer/ ├── CMakeLists .txt ├── src / │ ├── main .cpp │ ├── Application .hpp │ ├── Application .cpp │ ├── VulkanContext .hpp │ ├── VulkanContext .cpp │ ├── SwapChain .hpp │ ├── SwapChain .cpp │ ├── Pipeline .hpp │ ├── Pipeline .cpp │ ├── Buffer .hpp │ ├── Buffer .cpp │ ├── Image .hpp │ ├── Image .cpp │ ├── Renderer .hpp │ └── Renderer .cpp ├── shaders/ │ ├── mesh .vert │ ├── mesh .frag │ └── skybox .frag ├── textures/ ├── models/ └── include/ ├── stb_image .h └── tiny_obj_loader.h
4.5 Enabling Validation Layers
Validation layers are the most important debugging tool for Vulkan. Before doing anything else, let's understand how to use them.
The primary validation layer is VK_LAYER_KHRONOS_validation , which was introduced with Vulkan SDK 1.1.106 and consolidates many previously separate layers. It checks:
Valid API usage (correct parameter ranges, object lifetimes)
Synchronization correctness (race conditions, missing barriers)
Memory management issues (leaks, aliasing)
Thread safety violations
Performance warnings
Enable validation in the Vulkan instance:
const std::vector< const char *> validationLayers = { "VK_LAYER_KHRONOS_validation" }; bool checkValidationLayerSupport () { uint32_t layerCount; vkEnumerateInstanceLayerProperties (&layerCount, nullptr ); std::vector<VkLayerProperties> availableLayers (layerCount) ; vkEnumerateInstanceLayerProperties (&layerCount, availableLayers. data ()); for ( const char * layerName : validationLayers) { bool layerFound = false ; for ( const auto & layerProperties : availableLayers) { if ( strcmp (layerName, layerProperties.layerName) == 0 ) { layerFound = true ; break ; } } if (!layerFound) return false ; } return true ; }
We'll add this to our instance creation in the next chapter.
4.6 Configuring the Debug Messenger
When validation layers find a problem, they need a way to report it. Vulkan provides the VK_EXT_debug_utils extension for this. Here's how to set up a debug messenger:
static VKAPI_ATTR VkBool32 VKAPI_CALL debugCallback ( VkDebugUtilsMessageSeverityFlagBitsEXT messageSeverity, VkDebugUtilsMessageTypeFlagsEXT messageType, const VkDebugUtilsMessengerCallbackDataEXT* pCallbackData, void * pUserData) { if (messageSeverity >= VK_DEBUG_UTILS_MESSAGE_SEVERITY_WARNING_BIT_EXT) { std::cerr << "[VALIDATION] " << pCallbackData->pMessage << "
" ; if (messageSeverity == VK_DEBUG_UTILS_MESSAGE_SEVERITY_ERROR_BIT_EXT) { __debugbreak(); __builtin_trap(); } } return VK_FALSE; } VkDebugUtilsMessengerCreateInfoEXT getDebugMessengerCreateInfo () { VkDebugUtilsMessengerCreateInfoEXT createInfo{}; createInfo.sType = VK_STRUCTURE_TYPE_DEBUG_UTILS_MESSENGER_CREATE_INFO_EXT; createInfo.messageSeverity = VK_DEBUG_UTILS_MESSAGE_SEVERITY_VERBOSE_BIT_EXT | VK_DEBUG_UTILS_MESSAGE_SEVERITY_WARNING_BIT_EXT | VK_DEBUG_UTILS_MESSAGE_SEVERITY_ERROR_BIT_EXT; createInfo.messageType = VK_DEBUG_UTILS_MESSAGE_TYPE_GENERAL_BIT_EXT | VK_DEBUG_UTILS_MESSAGE_TYPE_VALIDATION_BIT_EXT | VK_DEBUG_UTILS_MESSAGE_TYPE_PERFORMANCE_BIT_EXT; createInfo.pfnUserCallback = debugCallback; return createInfo; }
Chapter 5: Vulkan Core Concepts and Object Model
Before writing any Vulkan code, let's build a mental model of how the API is structured.
5.1 Handles and Dispatchable vs Non-Dispatchable Objects
Vulkan represents its objects as opaque handles — integers or pointers that you pass to API functions. There are two kinds:
Dispatchable handles (pointer-sized, different address per object):
VkInstance
VkPhysicalDevice
VkDevice
VkQueue
VkCommandBuffer
Dispatchable handles can be directly dispatch-table-looked-up — the Vulkan loader uses them to route calls to the correct driver.
Non-dispatchable handles (64-bit integers, may be null on 32-bit systems):
VkBuffer , VkImage , VkImageView
, , VkPipeline , VkPipelineLayout
, VkRenderPass , VkFramebuffer
, VkShaderModule
VkDescriptorSet , VkDescriptorSetLayout , VkDescriptorPool
, , VkSampler
VkSwapchainKHR
VkSemaphore , VkFence , VkEvent
, , VkDeviceMemory
VkCommandPool
5.2 Vulkan Object Lifetime Rules
Every Vulkan object must be explicitly created and destroyed. The API uses symmetric create/destroy function pairs:
VkSomeObject obj; VkSomeObjectCreateInfo createInfo = {...}; vkCreateSomeObject (device, &createInfo, nullptr , &obj); vkDestroySomeObject (device, obj, nullptr );
The nullptr parameter is an optional allocator callback — you can provide custom CPU memory allocators, but nullptr uses the default allocator.
Object dependencies: Some objects depend on others and must be destroyed before their dependencies:
Images must be destroyed before their memory is freed
Image views must be destroyed before the image they view
Framebuffers must be destroyed before their render pass
Pipelines must be destroyed before their pipeline layout
The device must be destroyed before the instance
Failing to destroy objects in order is a validation error.
5.3 Create Info Structures
Almost every Vulkan function takes a pointer to a create-info structure. These structures follow a consistent pattern:
typedef struct VkSomeObjectCreateInfo { VkStructureType sType; const void * pNext; VkFlags flags; } VkSomeObjectCreateInfo;
The sType field allows the driver to verify the structure type. The pNext field is a pointer to a linked list of extension structures, enabling optional features without changing the function signature. Always initialize structures to {} (C++ value initialization) to zero all fields, then set the ones you need.
5.4 Return Codes
Most Vulkan functions return VkResult , an enum with the following important values:
Success codes (zero or positive):
VK_SUCCESS (0): Operation completed successfully
(0): Operation completed successfully VK_NOT_READY : Fence or query not yet available
: Fence or query not yet available VK_TIMEOUT : Wait timed out
: Wait timed out VK_EVENT_SET / VK_EVENT_RESET : Event state
/ : Event state VK_INCOMPLETE : Result array too small
: Result array too small VK_SUBOPTIMAL_KHR : Swap chain can still present but may not be optimal
Error codes (negative):
VK_ERROR_OUT_OF_HOST_MEMORY : CPU memory exhausted
: CPU memory exhausted VK_ERROR_OUT_OF_DEVICE_MEMORY : GPU memory exhausted
: GPU memory exhausted VK_ERROR_INITIALIZATION_FAILED
VK_ERROR_DEVICE_LOST : GPU crash or driver bug
: GPU crash or driver bug VK_ERROR_SURFACE_LOST_KHR : Window system surface gone (window closed)
: Window system surface gone (window closed) VK_ERROR_OUT_OF_DATE_KHR : Swap chain must be recreated (window resized)
Always check return values! A convenience macro:
VK_CHECK ( vkCreateInstance (&createInfo, nullptr , &instance));
5.5 The Queue System
Vulkan separates commands from execution. You record commands into command buffers (on the CPU), then submit command buffers to queues for execution on the GPU.
A queue is an ordered sequence of work for the GPU. Commands submitted to a queue execute in submission order relative to each other (within the same queue). Different queues can execute concurrently.
Queues are grouped into queue families based on their capabilities:
Graphics queues : Can execute draw commands, compute dispatches, and transfers
: Can execute draw commands, compute dispatches, and transfers Compute queues : Can execute compute dispatches and transfers
: Can execute compute dispatches and transfers Transfer queues : Can only execute transfer (copy) operations
: Can only execute transfer (copy) operations Video encode/decode queues: For hardware video codec operations (Vulkan 1.3)
On a discrete GPU, you might find:
1 graphics queue family with 1-4 queues
1 compute queue family with 8+ queues (for async compute)
1 transfer queue family with 2 queues (for async DMA transfers)
On integrated graphics, there might be just one queue family supporting everything.
5.6 Extension System
Vulkan's core API is deliberately minimal. Additional features are provided through extensions:
Instance extensions ( VkInstanceExtension ): Add functionality to the Vulkan loader/instance. Examples: VK_KHR_surface , VK_EXT_debug_utils
Device extensions ( VkDeviceExtension ): Add functionality to a specific device. Examples: VK_KHR_swapchain , VK_KHR_ray_tracing_pipeline
Extensions are either:
KHR extensions : Ratified by Khronos, cross-vendor
: Ratified by Khronos, cross-vendor EXT extensions : Multi-vendor collaboration
: Multi-vendor collaboration NV/AMD/ARM extensions: Vendor-specific
Important extensions we'll use:
VK_KHR_surface : Abstract windowing surface
: Abstract windowing surface VK_KHR_win32_surface / VK_KHR_xcb_surface / VK_KHR_metal_surface : Platform-specific window creation
/ / : Platform-specific window creation VK_KHR_swapchain : Presenting images to the screen
: Presenting images to the screen VK_EXT_debug_utils : Debug naming and messaging
: Debug naming and messaging VK_KHR_shader_float16_int8 : 16-bit floats and 8-bit integers in shaders
: 16-bit floats and 8-bit integers in shaders VK_EXT_descriptor_indexing : Bindless resources
Chapter 6: Instances, Physical Devices, and Logical Devices
We're ready to write our first Vulkan code. The sequence of object creation follows a fixed order:
VkInstance └─> VkPhysicalDevice (enumerate, don't create) └─> VkDevice └─> VkQueue (retrieved, not created)
6.1 Creating the VkInstance
The VkInstance is the root of all Vulkan state. It represents your application's connection to the Vulkan loader and drivers.
class VulkanApplication { public : VkInstance instance = VK_NULL_HANDLE; VkDebugUtilsMessengerEXT debugMessenger = VK_NULL_HANDLE; void createInstance () { VkApplicationInfo appInfo{}; appInfo.sType = VK_STRUCTURE_TYPE_APPLICATION_INFO; appInfo.pApplicationName = "My Vulkan App" ; appInfo.applicationVersion = VK_MAKE_VERSION ( 1 , 0 , 0 ); appInfo.pEngineName = "No Engine" ; appInfo.engineVersion = VK_MAKE_VERSION ( 1 , 0 , 0 ); appInfo.apiVersion = VK_API_VERSION_1_3; uint32_t glfwExtensionCount = 0 ; const char ** glfwExtensions = glfwGetRequiredInstanceExtensions (&glfwExtensionCount); std::vector< const char *> extensions ( glfwExtensions, glfwExtensions + glfwExtensionCount) ; extensions. push_back (VK_EXT_DEBUG_UTILS_EXTENSION_NAME); uint32_t availableExtCount = 0 ; vkEnumerateInstanceExtensionProperties ( nullptr , &availableExtCount, nullptr ); std::vector<VkExtensionProperties> availableExts (availableExtCount) ; vkEnumerateInstanceExtensionProperties ( nullptr , &availableExtCount, availableExts. data ()); for ( const char * required : extensions) { bool found = false ; for ( const auto & available : availableExts) { if ( strcmp (required, available.extensionName) == 0 ) { found = true ; break ; } } if (!found) { throw std:: runtime_error ( std:: string ( "Required extension not available: " ) + required); } } VkInstanceCreateInfo createInfo{}; createInfo.sType = VK_STRUCTURE_TYPE_INSTANCE_CREATE_INFO; createInfo.pApplicationInfo = &appInfo; createInfo.enabledExtensionCount = static_cast < uint32_t >(extensions. size ()); createInfo.ppEnabledExtensionNames = extensions. data (); VkDebugUtilsMessengerCreateInfoEXT debugCreateInfo{}; const std::vector< const char *> validationLayers = { "VK_LAYER_KHRONOS_validation" }; if ( checkValidationLayerSupport (validationLayers)) { createInfo.enabledLayerCount = static_cast < uint32_t >(validationLayers. size ()); createInfo.ppEnabledLayerNames = validationLayers. data (); debugCreateInfo = getDebugMessengerCreateInfo (); createInfo.pNext = &debugCreateInfo; } VkResult result = vkCreateInstance (&createInfo, nullptr , &instance); if (result != VK_SUCCESS) { throw std:: runtime_error ( "Failed to create Vulkan instance" ); } std::cout << "Vulkan instance created successfully
" ; } bool checkValidationLayerSupport ( const std::vector< const char *>& layers) { uint32_t layerCount; vkEnumerateInstanceLayerProperties (&layerCount, nullptr ); std::vector<VkLayerProperties> available (layerCount) ; vkEnumerateInstanceLayerProperties (&layerCount, available. data ()); for ( const char * name : layers) { bool found = false ; for ( const auto & props : available) { if ( strcmp (name, props.layerName) == 0 ) { found = true ; break ; } } if (!found) return false ; } return true ; } };
Understanding VK_API_VERSION_1_3
When you specify apiVersion = VK_API_VERSION_1_3 , you're requesting that the Vulkan implementation support at least Vulkan 1.3 features. However, the actual version supported depends on the driver. You should query the actual version at runtime with vkEnumerateInstanceVersion() and adapt accordingly.
Vulkan 1.3 (released January 2022) added important features like dynamic rendering (no need for render pass objects), synchronization2 (improved synchronization primitives), and made several previously-extension features core. We'll use some of these in our examples.
6.2 Setting Up the Debug Messenger
Now let's wire up the debug messenger we defined earlier:
void setupDebugMessenger () { return ; auto createFunc = (PFN_vkCreateDebugUtilsMessengerEXT) vkGetInstanceProcAddr (instance, "vkCreateDebugUtilsMessengerEXT" ); if (!createFunc) { throw std:: runtime_error ( "VK_EXT_debug_utils not available" ); } VkDebugUtilsMessengerCreateInfoEXT createInfo = getDebugMessengerCreateInfo (); VkResult result = createFunc (instance, &createInfo, nullptr , &debugMessenger); if (result != VK_SUCCESS) { throw std:: runtime_error ( "Failed to create debug messenger" ); } } void destroyDebugMessenger () { return ; auto destroyFunc = (PFN_vkDestroyDebugUtilsMessengerEXT) vkGetInstanceProcAddr (instance, "vkDestroyDebugUtilsMessengerEXT" ); if (destroyFunc && debugMessenger != VK_NULL_HANDLE) { destroyFunc (instance, debugMessenger, nullptr ); } }
Note that vkCreateDebugUtilsMessengerEXT is an extension function, so we must load it dynamically with vkGetInstanceProcAddr rather than calling it directly.
6.3 Selecting a Physical Device
VkPhysicalDevice represents a GPU in your system. You enumerate all available physical devices and select the best one for your needs.
struct QueueFamilyIndices { std::optional< uint32_t > graphicsFamily; std::optional< uint32_t > presentFamily; std::optional< uint32_t > computeFamily; std::optional< uint32_t > transferFamily; bool isComplete () const { return graphicsFamily. has_value () && presentFamily. has_value (); } }; class DeviceSelector { public : VkPhysicalDevice selectBestDevice ( VkInstance instance, VkSurfaceKHR surface) { uint32_t deviceCount = 0 ; vkEnumeratePhysicalDevices (instance, &deviceCount, nullptr ); if (deviceCount == 0 ) { throw std:: runtime_error ( "No Vulkan-capable GPUs found!" ); } std::vector<VkPhysicalDevice> devices (deviceCount) ; vkEnumeratePhysicalDevices (instance, &deviceCount, devices. data ()); VkPhysicalDevice best = VK_NULL_HANDLE; int bestScore = -1 ; for ( const auto & device : devices) { int score = scoreDevice (device, surface); std::cout << "Device: " << getDeviceName (device) << " score: " << score << "
" ; if (score > bestScore) { bestScore = score; best = device; } } if (best == VK_NULL_HANDLE) { throw std:: runtime_error ( "No suitable GPU found!" ); } std::cout << "Selected GPU: " << getDeviceName (best) << "
" ; return best; } private : std::string getDeviceName (VkPhysicalDevice device) { VkPhysicalDeviceProperties props; vkGetPhysicalDeviceProperties (device, &props); return props.deviceName; } int scoreDevice (VkPhysicalDevice device, VkSurfaceKHR surface) { VkPhysicalDeviceProperties deviceProperties; VkPhysicalDeviceFeatures deviceFeatures; vkGetPhysicalDeviceProperties (device, &deviceProperties); vkGetPhysicalDeviceFeatures (device, &deviceFeatures); QueueFamilyIndices indices = findQueueFamilies (device, surface); if (!indices. isComplete ()) return -1 ; if (! checkDeviceExtensionSupport (device)) return -1 ; SwapChainSupportDetails swapChainSupport = querySwapChainSupport (device, surface); if (swapChainSupport.formats. empty () || swapChainSupport.presentModes. empty ()) return -1 ; if (!deviceFeatures.samplerAnisotropy) return -1 ; if (!deviceFeatures.geometryShader) return -1 ; int score = 0 ; if (deviceProperties.deviceType == VK_PHYSICAL_DEVICE_TYPE_DISCRETE_GPU) { score += 10000 ; } else if (deviceProperties.deviceType == VK_PHYSICAL_DEVICE_TYPE_INTEGRATED_GPU) { score += 1000 ; } VkPhysicalDeviceMemoryProperties memProps; vkGetPhysicalDeviceMemoryProperties (device, &memProps); for ( uint32_t i = 0 ; i < memProps.memoryHeapCount; i++) { if (memProps.memoryHeaps[i].flags & VK_MEMORY_HEAP_DEVICE_LOCAL_BIT) { score += static_cast < int >( memProps.memoryHeaps[i].size / ( 1024 * 1024 * 1024 )); } } score += VK_VERSION_MAJOR (deviceProperties.apiVersion) * 100 ; score += VK_VERSION_MINOR (deviceProperties.apiVersion) * 10 ; return score; } bool checkDeviceExtensionSupport (VkPhysicalDevice device) { const std::vector< const char *> required = { VK_KHR_SWAPCHAIN_EXTENSION_NAME, }; uint32_t count; vkEnumerateDeviceExtensionProperties (device, nullptr , &count, nullptr ); std::vector<VkExtensionProperties> available (count) ; vkEnumerateDeviceExtensionProperties (device, nullptr , &count, available. data ()); for ( const char * name : required) { bool found = false ; for ( const auto & ext : available) { if ( strcmp (name, ext.extensionName) == 0 ) { found = true ; break ; } } if (!found) return false ; } return true ; } QueueFamilyIndices findQueueFamilies ( VkPhysicalDevice device, VkSurfaceKHR surface) { QueueFamilyIndices indices; uint32_t queueFamilyCount = 0 ; vkGetPhysicalDeviceQueueFamilyProperties ( device, &queueFamilyCount, nullptr ); std::vector<VkQueueFamilyProperties> queueFamilies (queueFamilyCount) ; vkGetPhysicalDeviceQueueFamilyProperties ( device, &queueFamilyCount, queueFamilies. data ()); for ( uint32_t i = 0 ; i < queueFamilyCount; i++) { const auto & family = queueFamilies[i]; if (family.queueFlags & VK_QUEUE_GRAPHICS_BIT) { indices.graphicsFamily = i; } VkBool32 presentSupport = false ; vkGetPhysicalDeviceSurfaceSupportKHR (device, i, surface, &presentSupport); if (presentSupport) { indices.presentFamily = i; } if ((family.queueFlags & VK_QUEUE_COMPUTE_BIT) && !(family.queueFlags & VK_QUEUE_GRAPHICS_BIT)) { indices.computeFamily = i; } if ((family.queueFlags & VK_QUEUE_TRANSFER_BIT) && !(family.queueFlags & VK_QUEUE_GRAPHICS_BIT) && !(family.queueFlags & VK_QUEUE_COMPUTE_BIT)) { indices.transferFamily = i; } } return indices; } };
Physical Device Properties and Features
The VkPhysicalDeviceProperties structure contains:
deviceType : Discrete, integrated, virtual, CPU, other
: Discrete, integrated, virtual, CPU, other deviceName : Human-readable name string
: Human-readable name string vendorID / deviceID : PCI IDs
/ : PCI IDs apiVersion / driverVersion : Supported API version, driver version
/ : Supported API version, driver version limits : A large struct with dozens of hardware limits (max texture size, max push constant size, max vertex attributes, etc.)
The VkPhysicalDeviceFeatures structure contains boolean flags for optional features:
geometryShader : Geometry shader support
: Geometry shader support tessellationShader : Tessellation support
: Tessellation support samplerAnisotropy : Anisotropic filtering
: Anisotropic filtering textureCompressionBC / textureCompressionETC2 / textureCompressionASTC_LDR : Texture compression formats
/ / : Texture compression formats multiDrawIndirect : Draw multiple primitives with one call
: Draw multiple primitives with one call wideLines , largePoints : Extended primitive sizes
For Vulkan 1.1+ features, you chain additional structs onto VkPhysicalDeviceFeatures2 :
VkPhysicalDeviceFeatures2 features2{}; features 2. sType = VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_FEATURES_2; VkPhysicalDeviceVulkan12Features features12{}; features 12. sType = VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_VULKAN_1_2_FEATURES; features 2. pNext = &features12; VkPhysicalDeviceVulkan13Features features13{}; features 13. sType = VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_VULKAN_1_3_FEATURES; features 12. pNext = &features13; vkGetPhysicalDeviceFeatures2 (physicalDevice, &features2); if (features 12. descriptorIndexing) { std::cout << "Bindless resources supported!
" ; } if (features 13. dynamicRendering) { std::cout << "Dynamic rendering supported!
" ; }
6.4 Creating the Logical Device
The VkDevice (logical device) is the primary interface for creating resources and submitting work. It represents a logical connection to a physical device with a specific set of features and extensions enabled.
struct DeviceContext { VkPhysicalDevice physicalDevice = VK_NULL_HANDLE; VkDevice device = VK_NULL_HANDLE; VkQueue graphicsQueue = VK_NULL_HANDLE; VkQueue presentQueue = VK_NULL_HANDLE; VkQueue computeQueue = VK_NULL_HANDLE; uint32_t graphicsQueueFamily = UINT32_MAX; uint32_t presentQueueFamily = UINT32_MAX; uint32_t computeQueueFamily = UINT32_MAX; void create (VkPhysicalDevice physDev, QueueFamilyIndices indices, const std::vector< const char *>& extensions, const std::vector< const char *>& validationLayers) { physicalDevice = physDev; graphicsQueueFamily = indices.graphicsFamily. value (); presentQueueFamily = indices.presentFamily. value (); if (indices.computeFamily. has_value ()) { computeQueueFamily = indices.computeFamily. value (); } std::set< uint32_t > uniqueFamilies = { graphicsQueueFamily, presentQueueFamily }; if (computeQueueFamily != UINT32_MAX) { uniqueFamilies. insert (computeQueueFamily); } float queuePriority = 1.0f ; std::vector<VkDeviceQueueCreateInfo> queueCreateInfos; for ( uint32_t family : uniqueFamilies) { VkDeviceQueueCreateInfo queueInfo{}; queueInfo.sType = VK_STRUCTURE_TYPE_DEVICE_QUEUE_CREATE_INFO; queueInfo.queueFamilyIndex = family; queueInfo.queueCount = 1 ; queueInfo.pQueuePriorities = &queuePriority; queueCreateInfos. push_back (queueInfo); } VkPhysicalDeviceFeatures deviceFeatures{}; deviceFeatures.samplerAnisotropy = VK_TRUE; deviceFeatures.fillModeNonSolid = VK_TRUE; deviceFeatures.wideLines = VK_TRUE; VkPhysicalDeviceVulkan12Features features12{}; features 12. sType = VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_VULKAN_1_2_FEATURES; features 12. descriptorIndexing = VK_TRUE; features 12. runtimeDescriptorArray = VK_TRUE; features 12. bufferDeviceAddress = VK_TRUE; VkPhysicalDeviceVulkan13Features features13{}; features 13. sType = VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_VULKAN_1_3_FEATURES; features 13. dynamicRendering = VK_TRUE; features 13. synchronization2 = VK_TRUE; features 12. pNext = &features13; VkDeviceCreateInfo createInfo{}; createInfo.sType = VK_STRUCTURE_TYPE_DEVICE_CREATE_INFO; createInfo.pNext = &features12; createInfo.pQueueCreateInfos = queueCreateInfos. data (); createInfo.queueCreateInfoCount = static_cast < uint32_t >(queueCreateInfos. size ()); createInfo.pEnabledFeatures = &deviceFeatures; createInfo.enabledExtensionCount = static_cast < uint32_t >(extensions. size ()); createInfo.ppEnabledExtensionNames = extensions. data (); createInfo.enabledLayerCount = static_cast < uint32_t >(validationLayers. size ()); createInfo.ppEnabledLayerNames = validationLayers. data (); VK_CHECK ( vkCreateDevice (physDev, &createInfo, nullptr , &device)); vkGetDeviceQueue (device, graphicsQueueFamily, 0 , &graphicsQueue); vkGetDeviceQueue (device, presentQueueFamily, 0 , &presentQueue); if (computeQueueFamily != UINT32_MAX) { vkGetDeviceQueue (device, computeQueueFamily, 0 , &computeQueue); } std::cout << "Logical device created successfully
" ; std::cout << " Graphics queue family: " << graphicsQueueFamily << "
" ; std::cout << " Present queue family: " << presentQueueFamily << "
" ; } void destroy () { if (device != VK_NULL_HANDLE) { vkDestroyDevice (device, nullptr ); device = VK_NULL_HANDLE; } } };
Note the pNext chain used to enable Vulkan 1.2 and 1.3 features. This pattern — chaining feature structs through pNext pointers — is pervasive in Vulkan and allows the API to be extended without breaking existing code.
6.5 Waiting for Device Idle
A common pattern in Vulkan is needing to wait for the GPU to finish all work before destroying resources or recreating the swap chain:
vkDeviceWaitIdle (device);
This is a synchronization point that stalls the CPU until the GPU is completely idle. Use it sparingly — it's appropriate when shutting down or recreating major resources (like during window resize), but not during normal rendering.
Chapter 7: Window Surfaces and the Swap Chain
A Vulkan surface represents a platform-specific window that Vulkan can render into. A swap chain manages a collection of images that are alternately presented to the display and rendered into.
7.1 Creating a Window with GLFW
GLFW (Graphics Library Framework) handles window creation and input in a cross-platform way. It also handles the platform-specific Vulkan surface creation:
class Window { public : GLFWwindow* window = nullptr ; int width, height; bool framebufferResized = false ; Window ( int w, int h, const char * title) : width (w), height (h) { glfwInit (); glfwWindowHint (GLFW_CLIENT_API, GLFW_NO_API); glfwWindowHint (GLFW_RESIZABLE, GLFW_TRUE); window = glfwCreateWindow (w, h, title, nullptr , nullptr ); glfwSetWindowUserPointer (window, this ); glfwSetFramebufferSizeCallback (window, framebufferResizeCallback); } ~ Window () { glfwDestroyWindow (window); glfwTerminate (); } bool shouldClose () const { return glfwWindowShouldClose (window); } void pollEvents () { glfwPollEvents (); } VkSurfaceKHR createSurface (VkInstance instance) { VkSurfaceKHR surface; if ( glfwCreateWindowSurface (instance, window, nullptr , &surface) != VK_SUCCESS) { throw std:: runtime_error ( "Failed to create window surface!" ); } return surface; } std::pair< int , int > getFramebufferSize () const { int w, h; glfwGetFramebufferSize (window, &w, &h); return {w, h}; } void waitWhileMinimized () { auto [w, h] = getFramebufferSize (); while (w == 0 || h == 0 ) { glfwWaitEvents (); std:: tie (w, h) = getFramebufferSize (); } } private : static void framebufferResizeCallback ( GLFWwindow* window, int width, int height) { auto app = static_cast <Window*>( glfwGetWindowUserPointer (window)); app->framebufferResized = true ; app->width = width; app->height = height; } };
7.2 Swap Chain Support Details
Before creating a swap chain, we need to query what the surface supports:
struct SwapChainSupportDetails { VkSurfaceCapabilitiesKHR capabilities; std::vector<VkSurfaceFormatKHR> formats; std::vector<VkPresentModeKHR> presentModes; }; SwapChainSupportDetails querySwapChainSupport ( VkPhysicalDevice device, VkSurfaceKHR surface) { SwapChainSupportDetails details; vkGetPhysicalDeviceSurfaceCapabilitiesKHR ( device, surface, &details.capabilities); uint32_t formatCount; vkGetPhysicalDeviceSurfaceFormatsKHR (device, surface, &formatCount, nullptr ); if (formatCount != 0 ) { details.formats. resize (formatCount); vkGetPhysicalDeviceSurfaceFormatsKHR ( device, surface, &formatCount, details.formats. data ()); } uint32_t presentModeCount; vkGetPhysicalDeviceSurfacePresentModesKHR ( device, surface, &presentModeCount, nullptr ); if (presentModeCount != 0 ) { details.presentModes. resize (presentModeCount); vkGetPhysicalDeviceSurfacePresentModesKHR ( device, surface, &presentModeCount, details.presentModes. data ()); } return details; }
Choosing the Surface Format
The surface format determines the pixel format and color space of swap chain images:
VkSurfaceFormatKHR chooseSwapSurfaceFormat ( const std::vector<VkSurfaceFormatKHR>& availableFormats) { for ( const auto & format : availableFormats) { if (format.format == VK_FORMAT_B8G8R8A8_SRGB && format.colorSpace == VK_COLOR_SPACE_SRGB_NONLINEAR_KHR) { return format; } } for ( const auto & format : availableFormats) { if (format.format == VK_FORMAT_B8G8R8A8_UNORM && format.colorSpace == VK_COLOR_SPACE_SRGB_NONLINEAR_KHR) { return format; } } return availableFormats[ 0 ]; }
For HDR (High Dynamic Range) displays, you'd look for VK_COLOR_SPACE_HDR10_ST2084_EXT or similar.
Choosing the Present Mode
Present mode controls the relationship between the renderer and the display's refresh cycle:
VkPresentModeKHR chooseSwapPresentMode ( const std::vector<VkPresentModeKHR>& availableModes, bool vsync = true ) { if (!vsync) { for ( const auto & mode : availableModes) { if (mode == VK_PRESENT_MODE_IMMEDIATE_KHR) { return mode; } } } for ( const auto & mode : availableModes) { if (mode == VK_PRESENT_MODE_MAILBOX_KHR) { return mode; } } return VK_PRESENT_MODE_FIFO_KHR; }
The four standard present modes:
VK_PRESENT_MODE_IMMEDIATE_KHR : No buffering. Frame is displayed immediately. Fastest, but can tear.
: No buffering. Frame is displayed immediately. Fastest, but can tear. VK_PRESENT_MODE_MAILBOX_KHR : Triple buffering. While a frame is being displayed, two others can be in the queue. New frames replace the queued frame rather than blocking. Low latency, no tearing.
: Triple buffering. While a frame is being displayed, two others can be in the queue. New frames replace the queued frame rather than blocking. Low latency, no tearing. VK_PRESENT_MODE_FIFO_KHR : Frames are queued; application blocks if queue is full. Classic vsync. Always available.
: Frames are queued; application blocks if queue is full. Classic vsync. Always available. VK_PRESENT_MODE_FIFO_RELAXED_KHR : Like FIFO, but if the application is slow and a frame was displayed for more than one refresh period, the next frame is displayed immediately (no tearing except at low framerates).
Choosing the Swap Extent
The swap extent is the resolution of swap chain images:
VkExtent2D chooseSwapExtent ( const VkSurfaceCapabilitiesKHR& capabilities, int framebufferWidth, int framebufferHeight) { if (capabilities.currentExtent.width != UINT32_MAX) { return capabilities.currentExtent; } VkExtent2D actualExtent = { static_cast < uint32_t >(framebufferWidth), static_cast < uint32_t >(framebufferHeight) }; actualExtent.width = std:: clamp ( actualExtent.width, capabilities.minImageExtent.width, capabilities.maxImageExtent.width ); actualExtent.height = std:: clamp ( actualExtent.height, capabilities.minImageExtent.height, capabilities.maxImageExtent.height ); return actualExtent; }
7.3 Creating the Swap Chain
Now we can create the swap chain:
class SwapChain { public : VkSwapchainKHR swapchain = VK_NULL_HANDLE; std::vector<VkImage> images; std::vector<VkImageView> imageViews; VkFormat imageFormat; VkExtent2D extent; uint32_t imageCount; void create (VkDevice device, VkPhysicalDevice physDevice, VkSurfaceKHR surface, int framebufferWidth, int framebufferHeight, uint32_t graphicsFamily, uint32_t presentFamily) { this ->device = device; SwapChainSupportDetails support = querySwapChainSupport (physDevice, surface); VkSurfaceFormatKHR surfaceFormat = chooseSwapSurfaceFormat (support.formats); VkPresentModeKHR presentMode = chooseSwapPresentMode (support.presentModes, true ); VkExtent2D swapExtent = chooseSwapExtent (support.capabilities, framebufferWidth, framebufferHeight); imageCount = support.capabilities.minImageCount + 1 ; if (support.capabilities.maxImageCount > 0 ) { imageCount = std:: min (imageCount, support.capabilities.maxImageCount); } VkSwapchainCreateInfoKHR createInfo{}; createInfo.sType = VK_STRUCTURE_TYPE_SWAPCHAIN_CREATE_INFO_KHR; createInfo.surface = surface; createInfo.minImageCount = imageCount; createInfo.imageFormat = surfaceFormat.format; createInfo.imageColorSpace = surfaceFormat.colorSpace; createInfo.imageExtent = swapExtent; createInfo.imageArrayLayers = 1 ; createInfo.imageUsage = VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT; uint32_t queueFamilyIndices[] = {graphicsFamily, presentFamily}; if (graphicsFamily != presentFamily) { createInfo.imageSharingMode = VK_SHARING_MODE_CONCURRENT; createInfo.queueFamilyIndexCount = 2 ; createInfo.pQueueFamilyIndices = queueFamilyIndices; } else { createInfo.imageSharingMode = VK_SHARING_MODE_EXCLUSIVE; } createInfo.preTransform = support.capabilities.currentTransform; createInfo.compositeAlpha = VK_COMPOSITE_ALPHA_OPAQUE_BIT_KHR; createInfo.presentMode = presentMode; createInfo.clipped = VK_TRUE; createInfo.oldSwapchain = VK_NULL_HANDLE; VK_CHECK ( vkCreateSwapchainKHR (device, &createInfo, nullptr , &swapchain)); imageFormat = surfaceFormat.format; extent = swapExtent; vkGetSwapchainImagesKHR (device, swapchain, &imageCount, nullptr ); images. resize (imageCount); vkGetSwapchainImagesKHR (device, swapchain, &imageCount, images. data ()); createImageViews (); std::cout << "Swap chain created: " << extent.width << "x" << extent.height << ", " << imageCount << " images
" ; } void destroy () { for ( auto & view : imageViews) { vkDestroyImageView (device, view, nullptr ); } imageViews. clear (); if (swapchain != VK_NULL_HANDLE) { vkDestroySwapchainKHR (device, swapchain, nullptr ); swapchain = VK_NULL_HANDLE; } } private : VkDevice device; void createImageViews () { imageViews. resize (images. size ()); for ( size_t i = 0 ; i < images. size (); i++) { imageViews[i] = createImageView ( device, images[i], imageFormat, VK_IMAGE_ASPECT_COLOR_BIT, 1 ); } } };
7.4 Image Views
A VkImageView describes how to interpret the data in a VkImage . An image view specifies:
Which portion of the image to access (mip levels, array layers)
How to interpret the image format (as color, depth, stencil, etc.)
Swizzle mapping (reorder RGBA components)
VkImageView createImageView ( VkDevice device, VkImage image, VkFormat format, VkImageAspectFlags aspectFlags, uint32_t mipLevels) { VkImageViewCreateInfo viewInfo{}; viewInfo.sType = VK_STRUCTURE_TYPE_IMAGE_VIEW_CREATE_INFO; viewInfo.image = image; viewInfo.viewType = VK_IMAGE_VIEW_TYPE_2D; viewInfo.format = format; viewInfo.components.r = VK_COMPONENT_SWIZZLE_IDENTITY; viewInfo.components.g = VK_COMPONENT_SWIZZLE_IDENTITY; viewInfo.components.b = VK_COMPONENT_SWIZZLE_IDENTITY; viewInfo.components.a = VK_COMPONENT_SWIZZLE_IDENTITY; viewInfo.subresourceRange.aspectMask = aspectFlags; viewInfo.subresourceRange.baseMipLevel = 0 ; viewInfo.subresourceRange.levelCount = mipLevels; viewInfo.subresourceRange.baseArrayLayer = 0 ; viewInfo.subresourceRange.layerCount = 1 ; VkImageView imageView; VK_CHECK ( vkCreateImageView (device, &viewInfo, nullptr , &imageView)); return imageView; }
7.5 Recreating the Swap Chain
When the window is resized, the swap chain must be recreated with the new dimensions:
void recreateSwapChain () { window. waitWhileMinimized (); vkDeviceWaitIdle (device.device); swapChain. destroy (); auto [w, h] = window. getFramebufferSize (); swapChain. create (device.device, device.physicalDevice, surface, w, h, device.graphicsQueueFamily, device.presentQueueFamily); recreateDepthBuffer (); recreateFramebuffers (); }
Chapter 8: The Render Pass and Framebuffers
A render pass describes the structure of a rendering operation: what attachments (color, depth, stencil buffers) are used, how they're loaded and stored, and how they transition between states. A framebuffer binds actual image views to the attachment slots defined by a render pass.
8.1 Render Pass Concepts
A render pass consists of:
Attachments: Descriptions of the images (color, depth, stencil) that the render pass reads from and writes to. Each attachment description specifies:
Format (VkFormat)
Sample count (for multisampling)
Load operation: how is the attachment's initial content handled? VK_ATTACHMENT_LOAD_OP_LOAD : Preserve existing content VK_ATTACHMENT_LOAD_OP_CLEAR : Clear to a specified value VK_ATTACHMENT_LOAD_OP_DONT_CARE : Content undefined (fastest on tile GPUs)
Store operation: what happens to the attachment content after the pass? VK_ATTACHMENT_STORE_OP_STORE : Write back to memory VK_ATTACHMENT_STORE_OP_DONT_CARE : Discard (save bandwidth on tile GPUs)
Initial and final image layouts
Subpasses: A render pass can contain multiple subpasses that can read results of previous subpasses via input attachments. On tile-based GPUs, data between subpasses can stay in the on-chip tile memory without going to main memory.
Subpass Dependencies: Explicit synchronization between subpasses (or between external operations and subpasses), specifying which pipeline stages and memory accesses must complete before the dependent subpass begins.
8.2 Creating a Basic Render Pass
For our initial renderer, we'll create a simple render pass with a color attachment and a depth attachment:
class RenderPass { public : VkRenderPass renderPass = VK_NULL_HANDLE; void create (VkDevice device, VkFormat colorFormat, VkFormat depthFormat, VkSampleCountFlagBits msaaSamples = VK_SAMPLE_COUNT_1_BIT) { this ->device = device; VkAttachmentDescription colorAttachment{}; colorAttachment.format = colorFormat; colorAttachment.samples = msaaSamples; colorAttachment.loadOp = VK_ATTACHMENT_LOAD_OP_CLEAR; colorAttachment.storeOp = VK_ATTACHMENT_STORE_OP_STORE; colorAttachment.stencilLoadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE; colorAttachment.stencilStoreOp = VK_ATTACHMENT_STORE_OP_DONT_CARE; colorAttachment.initialLayout = VK_IMAGE_LAYOUT_UNDEFINED; colorAttachment.finalLayout = (msaaSamples == VK_SAMPLE_COUNT_1_BIT) ? VK_IMAGE_LAYOUT_PRESENT_SRC_KHR : VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL; VkAttachmentDescription depthAttachment{}; depthAttachment.format = depthFormat; depthAttachment.samples = msaaSamples; depthAttachment.loadOp = VK_ATTACHMENT_LOAD_OP_CLEAR; depthAttachment.storeOp = VK_ATTACHMENT_STORE_OP_DONT_CARE; depthAttachment.stencilLoadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE; depthAttachment.stencilStoreOp = VK_ATTACHMENT_STORE_OP_DONT_CARE; depthAttachment.initialLayout = VK_IMAGE_LAYOUT_UNDEFINED; depthAttachment.finalLayout = VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL; std::vector<VkAttachmentDescription> attachments; attachments. push_back (colorAttachment); attachments. push_back (depthAttachment); VkAttachmentReference colorAttachmentRef{}; colorAttachmentRef.attachment = 0 ; colorAttachmentRef.layout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL; VkAttachmentReference depthAttachmentRef{}; depthAttachmentRef.attachment = 1 ; depthAttachmentRef.layout = VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL; VkSubpassDescription subpass{}; subpass.pipelineBindPoint = VK_PIPELINE_BIND_POINT_GRAPHICS; subpass.colorAttachmentCount = 1 ; subpass.pColorAttachments = &colorAttachmentRef; subpass.pDepthStencilAttachment = &depthAttachmentRef; VkSubpassDependency dependency{}; dependency.srcSubpass = VK_SUBPASS_EXTERNAL; dependency.dstSubpass = 0 ; dependency.srcStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT | VK_PIPELINE_STAGE_EARLY_FRAGMENT_TESTS_BIT; dependency.srcAccessMask = 0 ; dependency.dstStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT | VK_PIPELINE_STAGE_EARLY_FRAGMENT_TESTS_BIT; dependency.dstAccessMask = VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT | VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_WRITE_BIT; VkRenderPassCreateInfo renderPassInfo{}; renderPassInfo.sType = VK_STRUCTURE_TYPE_RENDER_PASS_CREATE_INFO; renderPassInfo.attachmentCount = static_cast < uint32_t >(attachments. size ()); renderPassInfo.pAttachments = attachments. data (); renderPassInfo.subpassCount = 1 ; renderPassInfo.pSubpasses = &subpass; renderPassInfo.dependencyCount = 1 ; renderPassInfo.pDependencies = &dependency; VK_CHECK ( vkCreateRenderPass (device, &renderPassInfo, nullptr , &renderPass)); } void destroy () { if (renderPass != VK_NULL_HANDLE) { vkDestroyRenderPass (device, renderPass, nullptr ); renderPass = VK_NULL_HANDLE; } } private : VkDevice device; };
8.3 Image Layout Transitions
Image layouts are a critical concept in Vulkan. Images must be in the correct layout for the operation being performed. The key layouts:
VK_IMAGE_LAYOUT_UNDEFINED : Content is undefined. Transition to this if you don't care about old content.
: Content is undefined. Transition to this if you don't care about old content. VK_IMAGE_LAYOUT_GENERAL : Can be used for any purpose, but usually not optimal for anything.
: Can be used for any purpose, but usually not optimal for anything. VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL : For color attachments in render passes.
: For color attachments in render passes. VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL : For depth/stencil attachments.
: For depth/stencil attachments. VK_IMAGE_LAYOUT_DEPTH_STENCIL_READ_ONLY_OPTIMAL : For reading depth as a texture (shadow maps).
: For reading depth as a texture (shadow maps). VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL : For reading in a shader.
: For reading in a shader. VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL : For copies as source.
: For copies as source. VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL : For copies as destination.
: For copies as destination. VK_IMAGE_LAYOUT_PRESENT_SRC_KHR : For presenting to the swap chain.
Transitions are performed using pipeline barriers or declared in render pass attachment descriptions (which handle transitions automatically at pass boundaries).
8.4 Creating Framebuffers
A framebuffer binds actual VkImageView objects to the attachment slots defined by a render pass:
class Framebuffer { public : VkFramebuffer framebuffer = VK_NULL_HANDLE; void create (VkDevice device, VkRenderPass renderPass, const std::vector<VkImageView>& attachments, VkExtent2D extent) { this ->device = device; VkFramebufferCreateInfo framebufferInfo{}; framebufferInfo.sType = VK_STRUCTURE_TYPE_FRAMEBUFFER_CREATE_INFO; framebufferInfo.renderPass = renderPass; framebufferInfo.attachmentCount = static_cast < uint32_t >(attachments. size ()); framebufferInfo.pAttachments = attachments. data (); framebufferInfo.width = extent.width; framebufferInfo.height = extent.height; framebufferInfo.layers = 1 ; VK_CHECK ( vkCreateFramebuffer (device, &framebufferInfo, nullptr , &framebuffer)); } void destroy () { if (framebuffer != VK_NULL_HANDLE) { vkDestroyFramebuffer (device, framebuffer, nullptr ); framebuffer = VK_NULL_HANDLE; } } private : VkDevice device; };
We create one framebuffer per swap chain image. Each framebuffer references:
The corresponding swap chain image view (color attachment)
The shared depth image view (depth attachment)
void createFramebuffers ( VkDevice device, VkRenderPass renderPass, const SwapChain& swapChain, VkImageView depthImageView, std::vector<Framebuffer>& framebuffers) { framebuffers. resize (swapChain.imageViews. size ()); for ( size_t i = 0 ; i < swapChain.imageViews. size (); i++) { std::vector<VkImageView> attachments = { swapChain.imageViews[i], depthImageView }; framebuffers[i]. create (device, renderPass, attachments, swapChain.extent); } }
Chapter 9: Shaders and SPIR-V
Shaders are programs that run on the GPU. Unlike OpenGL, which accepts GLSL source code and compiles it at runtime, Vulkan uses SPIR-V — a pre-compiled binary intermediate representation.
9.1 SPIR-V: The Intermediate Language
SPIR-V (Standard Portable Intermediate Representation V) was designed by Khronos as a common target for GPU shader languages. Instead of every driver implementing its own GLSL compiler, drivers implement a SPIR-V to machine code compiler. This separates concerns and enables:
Offline compilation : Compile shaders at build time, not at startup.
: Compile shaders at build time, not at startup. Language independence : Any shader language that can target SPIR-V works with Vulkan (GLSL, HLSL, Rust GPU, Slang).
: Any shader language that can target SPIR-V works with Vulkan (GLSL, HLSL, Rust GPU, Slang). Predictable behavior : The SPIR-V binary has well-defined semantics; there's no room for compiler interpretation differences between vendors.
: The SPIR-V binary has well-defined semantics; there's no room for compiler interpretation differences between vendors. Security: SPIR-V is structured and validated; it's harder to accidentally or maliciously crash the driver with SPIR-V than with arbitrary source code.
We write shaders in GLSL (or HLSL) and compile to SPIR-V using glslc (from Google) or glslangValidator (from Khronos):
glslc shader.vert -o shader.vert.spv glslc shader.frag -o shader.frag.spv
9.2 GLSL for Vulkan
GLSL for Vulkan differs slightly from OpenGL GLSL. The key differences:
Explicit binding locations : Use layout(binding = N) for all resources
: Use for all resources Push constants : A new block type for small, frequently-changing data
: A new block type for small, frequently-changing data SPIR-V built-ins : Direct access to gl_VertexIndex , gl_InstanceIndex , etc.
: Direct access to , , etc. No default uniforms: All uniforms must be in blocks
Vertex Shader Example
layout ( location = 0 ) in vec3 inPosition; layout ( location = 1 ) in vec3 inNormal; layout ( location = 2 ) in vec2 inTexCoord; layout ( location = 3 ) in vec4 inColor; layout ( location = 0 ) out vec3 fragPos; layout ( location = 1 ) out vec3 fragNormal; layout ( location = 2 ) out vec2 fragTexCoord; layout ( location = 3 ) out vec4 fragColor; layout (set = 0 , binding = 0 ) uniform UniformBufferObject { mat4 model; mat4 view; mat4 proj; mat4 normalMatrix; vec4 cameraPos; } ubo; layout (push_constant) uniform PushConstants { mat4 instanceTransform; vec4 tintColor; } push; void main() { vec4 worldPos = push.instanceTransform * ubo.model * vec4 (inPosition, 1.0 ); vec4 clipPos = ubo.proj * ubo.view * worldPos; gl_Position = clipPos; fragPos = worldPos.xyz; fragNormal = normalize ( mat3 (ubo.normalMatrix) * inNormal); fragTexCoord = inTexCoord; fragColor = inColor; }
Fragment Shader Example
layout ( location = 0 ) in vec3 fragPos; layout ( location = 1 ) in vec3 fragNormal; layout ( location = 2 ) in vec2 fragTexCoord; layout ( location = 3 ) in vec4 fragColor; layout ( location = 0 ) out vec4 outColor; layout (set = 0 , binding = 0 ) uniform UniformBufferObject { mat4 model; mat4 view; mat4 proj; mat4 normalMatrix; vec4 cameraPos; } ubo; layout (set = 1 , binding = 0 ) uniform sampler2D albedoTexture; layout (set = 1 , binding = 1 ) uniform sampler2D normalTexture; layout (set = 1 , binding = 2 ) uniform sampler2D roughnessMetallicTexture; layout (set = 1 , binding = 3 ) uniform MaterialUBO { vec4 baseColor; float metallic; float roughness; float emissiveFactor; float alphaCutoff; } material; struct Light { vec4 position; vec4 color; vec4 attenuation; }; layout (set = 0 , binding = 1 ) uniform LightsUBO { Light lights[ 8 ]; int numLights; float ambientIntensity; vec2 padding; } lighting; const float PI = 3.14159265359 ; float NDF_GGX( vec3 N, vec3 H, float roughness) { float a = roughness * roughness; float a2 = a * a; float NdotH = max ( dot (N, H), 0.0 ); float NdotH2 = NdotH * NdotH; float denom = NdotH2 * (a2 - 1.0 ) + 1.0 ; denom = PI * denom * denom; return a2 / denom; } float GeometrySchlickGGX( float NdotV, float roughness) { float r = roughness + 1.0 ; float k = (r * r) / 8.0 ; return NdotV / (NdotV * ( 1.0 - k) + k); } float GeometrySmith( vec3 N, vec3 V, vec3 L, float roughness) { float NdotV = max ( dot (N, V), 0.0 ); float NdotL = max ( dot (N, L), 0.0 ); return GeometrySchlickGGX(NdotV, roughness) * GeometrySchlickGGX(NdotL, roughness); } vec3 FresnelSchlick( float cosTheta, vec3 F0) { return F0 + ( 1.0 - F0) * pow ( clamp ( 1.0 - cosTheta, 0.0 , 1.0 ), 5.0 ); } void main() { vec4 albedoSample = texture (albedoTexture, fragTexCoord); vec3 albedo = pow (albedoSample.rgb, vec3 ( 2.2 )); albedo *= material.baseColor.rgb * fragColor.rgb; float alpha = albedoSample.a * material.baseColor.a * fragColor.a; if (alpha < material.alphaCutoff) discard ; vec2 roughnessMetal = texture (roughnessMetallicTexture, fragTexCoord).gb; float roughness = roughnessMetal.x * material.roughness; float metallic = roughnessMetal.y * material.metallic; vec3 N = normalize (fragNormal); vec3 V = normalize (ubo.cameraPos.xyz - fragPos); vec3 F0 = mix ( vec3 ( 0.04 ), albedo, metallic); vec3 Lo = vec3 ( 0.0 ); for ( int i = 0 ; i < lighting.numLights; i++) { Light light = lighting.lights[i]; vec3 L; float attenuation = 1.0 ; if (light.position.w == 0.0 ) { L = normalize (-light.position.xyz); } else { vec3 lightVec = light.position.xyz - fragPos; float distance = length (lightVec); L = normalize (lightVec); float att = light.attenuation.x + light.attenuation.y * distance + light.attenuation.z * distance * distance ; attenuation = 1.0 / max (att, 0.001 ); } vec3 H = normalize (V + L); vec3 radiance = light.color.rgb * light.color.w * attenuation; float NDF = NDF_GGX(N, H, roughness); float G = GeometrySmith(N, V, L, roughness); vec3 F = FresnelSchlick( max ( dot (H, V), 0.0 ), F0); vec3 numerator = NDF * G * F; float denominator = 4.0 * max ( dot (N, V), 0.0 ) * max ( dot (N, L), 0.0 ) + 0.0001 ; vec3 specular = numerator / denominator; vec3 kS = F; vec3 kD = ( vec3 ( 1.0 ) - kS) * ( 1.0 - metallic); float NdotL = max ( dot (N, L), 0.0 ); Lo += (kD * albedo / PI + specular) * radiance * NdotL; } vec3 ambient = vec3 (lighting.ambientIntensity) * albedo; vec3 color = ambient + Lo; color += albedo * material.emissiveFactor; color = color / (color + vec3 ( 1.0 )); outColor = vec4 (color, alpha); }
Compute Shader Example
layout ( local_size_x = 256 , local_size_y = 1 , local_size_z = 1 ) in ; struct Particle { vec4 position; vec4 velocity; vec4 color; }; layout (set = 0 , binding = 0 ) buffer ParticleBuffer { Particle particles[]; } particleBuffer; layout (push_constant) uniform PushConstants { float deltaTime; float time; vec3 gravity; float padding; } push; void main() { uint index = gl_GlobalInvocationID .x; if ( index >= particleBuffer.particles. length ()) return ; Particle p = particleBuffer.particles[ index ]; p.velocity.xyz += push.gravity * push.deltaTime; p.position.xyz += p.velocity.xyz * push.deltaTime; p.position.w -= push.deltaTime; if (p.position.w <= 0.0 ) { float angle = float ( index ) * 2.399 ; float speed = 2.0 + mod ( float ( index ) * 0.01 , 3.0 ); p.position = vec4 ( 0.0 , 0.0 , 0.0 , 2.0 + mod ( float ( index ) * 0.1 , 3.0 )); p.velocity = vec4 ( cos (angle) * speed, 5.0 + mod ( float ( index ) * 0.02 , 5.0 ), sin (angle) * speed, 0.0 ); p.color = vec4 ( mod ( float ( index ) * 0.03 , 1.0 ), mod ( float ( index ) * 0.07 , 1.0 ), mod ( float ( index ) * 0.11 , 1.0 ), 1.0 ); } particleBuffer.particles[ index ] = p; }
9.3 Loading SPIR-V and Creating Shader Modules
Once compiled, we load SPIR-V bytecode and create VkShaderModule objects:
std::vector< uint32_t > readSPIRV ( const std::string& filename) { std::ifstream file (filename, std::ios::ate | std::ios::binary) ; if (!file. is_open ()) { throw std:: runtime_error ( "Failed to open shader file: " + filename); } size_t fileSize = static_cast < size_t >(file. tellg ()); if (fileSize % 4 != 0 ) { throw std:: runtime_error ( "SPIR-V file size must be a multiple of 4: " + filename); } std::vector< uint32_t > buffer (fileSize / 4 ) ; file. seekg ( 0 ); file. read ( reinterpret_cast < char *>(buffer. data ()), fileSize); return buffer; } VkShaderModule createShaderModule (VkDevice device, const std::vector< uint32_t >& code) { VkShaderModuleCreateInfo createInfo{}; createInfo.sType = VK_STRUCTURE_TYPE_SHADER_MODULE_CREATE_INFO; createInfo.codeSize = code. size () * sizeof ( uint32_t ); createInfo.pCode = code. data (); VkShaderModule shaderModule; VK_CHECK ( vkCreateShaderModule (device, &createInfo, nullptr , &shaderModule)); return shaderModule; }
Shader modules are only needed during pipeline creation. After you've created the graphics pipeline, you can (and should) destroy the shader modules to free memory:
VkShaderModule vertShaderModule = createShaderModule (device, readSPIRV ( "shaders/mesh.vert.spv" )); VkShaderModule fragShaderModule = createShaderModule (device, readSPIRV ( "shaders/mesh.frag.spv" )); vkDestroyShaderModule (device, vertShaderModule, nullptr ); vkDestroyShaderModule (device, fragShaderModule, nullptr );
9.4 Shader Reflection and Specialization Constants
Specialization constants are shader constants whose values are set at pipeline creation time (not uniform buffer updates at runtime). This allows the driver to optimize the shader for specific constant values (e.g., eliminating dead code branches):
layout (constant_id = 0 ) const int LIGHT_COUNT = 8 ; layout (constant_id = 1 ) const bool ENABLE_SHADOWS = true ; void main() { for ( int i = 0 ; i < LIGHT_COUNT; i++) { if (ENABLE_SHADOWS) { } } }
VkSpecializationMapEntry entries[ 2 ]; entries[ 0 ] = { 0 , 0 , sizeof ( int )}; entries[ 1 ] = { 1 , sizeof ( int ), sizeof (VkBool32)}; struct SpecData { int lightCount = 4 ; VkBool32 enableShadows = VK_TRUE; } specData; VkSpecializationInfo specInfo{}; specInfo.mapEntryCount = 2 ; specInfo.pMapEntries = entries; specInfo.dataSize = sizeof (specData); specInfo.pData = &specData; VkPipelineShaderStageCreateInfo fragStageInfo{}; fragStageInfo.sType = VK_STRUCTURE_TYPE_PIPELINE_SHADER_STAGE_CREATE_INFO; fragStageInfo.stage = VK_SHADER_STAGE_FRAGMENT_BIT; fragStageInfo. module = fragShaderModule; fragStageInfo.pName = "main" ; fragStageInfo.pSpecializationInfo = &specInfo;
Chapter 10: The Graphics Pipeline
The graphics pipeline is one of the most complex objects in Vulkan — but also one of the most powerful. It encodes virtually all the state that affects how geometry is rendered: which shaders to use, how vertices are laid out, how rasterization works, how depth testing is done, how blending works.
10.1 Pipeline State Overview
The graphics pipeline encompasses these states:
Shader stages: Which shaders run at each programmable stage Vertex input state: How vertex data is organized in buffers Input assembly state: How vertices are assembled into primitives Viewport and scissor state: The viewport rectangle and clipping rectangle Rasterization state: Fill mode, cull mode, front face winding Multisample state: MSAA sample count and coverage Depth/stencil state: Depth test function, write mask, stencil operations Color blend state: Per-attachment blend equations and write masks Dynamic state: Which states can be changed without recreating the pipeline Pipeline layout: Descriptor set and push constant layouts
10.2 Vertex Input and Assembly
struct Vertex { glm::vec3 pos; glm::vec3 normal; glm::vec2 texCoord; glm::vec4 color; static VkVertexInputBindingDescription getBindingDescription () { VkVertexInputBindingDescription bindingDesc{}; bindingDesc.binding = 0 ; bindingDesc.stride = sizeof (Vertex); bindingDesc.inputRate = VK_VERTEX_INPUT_RATE_VERTEX; return bindingDesc; } static std::array<VkVertexInputAttributeDescription, 4> getAttributeDescriptions () { std::array<VkVertexInputAttributeDescription, 4> attribs{}; attribs[ 0 ].binding = 0 ; attribs[ 0 ].location = 0 ; attribs[ 0 ].format = VK_FORMAT_R32G32B32_SFLOAT; attribs[ 0 ].offset = offsetof (Vertex, pos); attribs[ 1 ].binding = 0 ; attribs[ 1 ].location = 1 ; attribs[ 1 ].format = VK_FORMAT_R32G32B32_SFLOAT; attribs[ 1 ].offset = offsetof (Vertex, normal); attribs[ 2 ].binding = 0 ; attribs[ 2 ].location = 2 ; attribs[ 2 ].format = VK_FORMAT_R32G32_SFLOAT; attribs[ 2 ].offset = offsetof (Vertex, texCoord); attribs[ 3 ].binding = 0 ; attribs[ 3 ].location = 3 ; attribs[ 3 ].format = VK_FORMAT_R32G32B32A32_SFLOAT; attribs[ 3 ].offset = offsetof (Vertex, color); return attribs; } bool operator ==( const Vertex& other) const { return pos == other.pos && normal == other.normal && texCoord == other.texCoord && color == other.color; } }; VkPipelineVertexInputStateCreateInfo vertexInputInfo{}; vertexInputInfo.sType = VK_STRUCTURE_TYPE_PIPELINE_VERTEX_INPUT_STATE_CREATE_INFO; auto bindingDesc = Vertex:: getBindingDescription (); auto attrDescs = Vertex:: getAttributeDescriptions (); vertexInputInfo.vertexBindingDescriptionCount = 1 ; vertexInputInfo.pVertexBindingDescriptions = &bindingDesc; vertexInputInfo.vertexAttributeDescriptionCount = static_cast < uint32_t >(attrDescs. size ()); vertexInputInfo.pVertexAttributeDescriptions = attrDescs. data (); VkPipelineInputAssemblyStateCreateInfo inputAssembly{}; inputAssembly.sType = VK_STRUCTURE_TYPE_PIPELINE_INPUT_ASSEMBLY_STATE_CREATE_INFO; inputAssembly.topology = VK_PRIMITIVE_TOPOLOGY_TRIANGLE_LIST; inputAssembly.primitiveRestartEnable = VK_FALSE;
The topology options:
VK_PRIMITIVE_TOPOLOGY_POINT_LIST : Individual points
: Individual points VK_PRIMITIVE_TOPOLOGY_LINE_LIST : Pairs of vertices form lines
: Pairs of vertices form lines VK_PRIMITIVE_TOPOLOGY_LINE_STRIP : Connected line segments
: Connected line segments VK_PRIMITIVE_TOPOLOGY_TRIANGLE_LIST : Triples of vertices form triangles (most common)
: Triples of vertices form triangles (most common) VK_PRIMITIVE_TOPOLOGY_TRIANGLE_STRIP : Triangle strip (reuse vertices)
: Triangle strip (reuse vertices) VK_PRIMITIVE_TOPOLOGY_TRIANGLE_FAN : All triangles share a common vertex
10.3 Viewport and Scissor
VkViewport viewport{}; viewport.x = 0.0f ; viewport.y = 0.0f ; viewport.width = static_cast < float >(swapChainExtent.width); viewport.height = static_cast < float >(swapChainExtent.height); viewport.minDepth = 0.0f ; viewport.maxDepth = 1.0f ; VkRect2D scissor{}; scissor.offset = { 0 , 0 }; scissor.extent = swapChainExtent; VkPipelineViewportStateCreateInfo viewportState{}; viewportState.sType = VK_STRUCTURE_TYPE_PIPELINE_VIEWPORT_STATE_CREATE_INFO; viewportState.viewportCount = 1 ; viewportState.pViewports = &viewport; viewportState.scissorCount = 1 ; viewportState.pScissors = &scissor;
Note: Many applications make viewport and scissor dynamic, meaning they can be changed per draw call without recreating the pipeline:
std::vector<VkDynamicState> dynamicStates = { VK_DYNAMIC_STATE_VIEWPORT, VK_DYNAMIC_STATE_SCISSOR }; VkPipelineDynamicStateCreateInfo dynamicState{}; dynamicState.sType = VK_STRUCTURE_TYPE_PIPELINE_DYNAMIC_STATE_CREATE_INFO; dynamicState.dynamicStateCount = static_cast < uint32_t >(dynamicStates. size ()); dynamicState.pDynamicStates = dynamicStates. data (); viewportState.viewportCount = 1 ; viewportState.pViewports = nullptr ; viewportState.scissorCount = 1 ; viewportState.pScissors = nullptr ; vkCmdSetViewport (commandBuffer, 0 , 1 , &viewport); vkCmdSetScissor (commandBuffer, 0 , 1 , &scissor);
10.4 Rasterization State
VkPipelineRasterizationStateCreateInfo rasterizer{}; rasterizer.sType = VK_STRUCTURE_TYPE_PIPELINE_RASTERIZATION_STATE_CREATE_INFO; rasterizer.depthClampEnable = VK_FALSE; rasterizer.rasterizerDiscardEnable = VK_FALSE; rasterizer.polygonMode = VK_POLYGON_MODE_FILL; rasterizer.lineWidth = 1.0f ; rasterizer.cullMode = VK_CULL_MODE_BACK_BIT; rasterizer.frontFace = VK_FRONT_FACE_COUNTER_CLOCKWISE; rasterizer.depthBiasEnable = VK_FALSE; rasterizer.depthBiasConstantFactor = 0.0f ; rasterizer.depthBiasClamp = 0.0f ; rasterizer.depthBiasSlopeFactor = 0.0f ;
The GLM Y-Axis Flip Issue
One common point of confusion: GLM (OpenGL Math Library) uses a right-handed coordinate system where Y points up, but Vulkan's NDC has Y pointing down (clip-space Y is flipped compared to OpenGL).
The standard fix is to flip the Y in the projection matrix:
glm::mat4 proj = glm:: perspective ( glm:: radians ( 45.0f ), extent.width / ( float )extent.height, 0.1f , 1000.0f ); proj[ 1 ][ 1 ] *= -1 ;
Alternatively, use VK_FRONT_FACE_CLOCKWISE if you flip Y in the shader.
10.5 Multisample State
VkPipelineMultisampleStateCreateInfo multisampling{}; multisampling.sType = VK_STRUCTURE_TYPE_PIPELINE_MULTISAMPLE_STATE_CREATE_INFO; multisampling.sampleShadingEnable = VK_FALSE; multisampling.rasterizationSamples = msaaSamples; multisampling.minSampleShading = 1.0f ; multisampling.pSampleMask = nullptr ; multisampling.alphaToCoverageEnable = VK_FALSE; multisampling.alphaToOneEnable = VK_FALSE;
10.6 Depth and Stencil State
VkPipelineDepthStencilStateCreateInfo depthStencil{}; depthStencil.sType = VK_STRUCTURE_TYPE_PIPELINE_DEPTH_STENCIL_STATE_CREATE_INFO; depthStencil.depthTestEnable = VK_TRUE; depthStencil.depthWriteEnable = VK_TRUE; depthStencil.depthCompareOp = VK_COMPARE_OP_LESS; depthStencil.depthBoundsTestEnable = VK_FALSE; depthStencil.minDepthBounds = 0.0f ; depthStencil.maxDepthBounds = 1.0f ; depthStencil.stencilTestEnable = VK_FALSE; depthStencil.front = {}; depthStencil.back = {};
Compare operations:
VK_COMPARE_OP_NEVER : Never passes
: Never passes VK_COMPARE_OP_LESS : Passes if fragment depth < stored depth (standard forward rendering)
: Passes if fragment depth < stored depth (standard forward rendering) VK_COMPARE_OP_EQUAL : For depth pre-pass optimization
: For depth pre-pass optimization VK_COMPARE_OP_LESS_OR_EQUAL : For skybox and other special cases
: For skybox and other special cases VK_COMPARE_OP_GREATER : For reverse-Z rendering (better precision)
: For reverse-Z rendering (better precision) VK_COMPARE_OP_ALWAYS : Always passes (disable depth testing)
Reverse-Z: Using a reversed depth range (near=1.0, far=0.0) and GREATER comparison provides better floating-point depth precision for distant objects, reducing z-fighting. This is increasingly standard in modern engines.
10.7 Color Blend State
VkPipelineColorBlendAttachmentState colorBlendAttachment{}; colorBlendAttachment.colorWriteMask = VK_COLOR_COMPONENT_R_BIT | VK_COLOR_COMPONENT_G_BIT | VK_COLOR_COMPONENT_B_BIT | VK_COLOR_COMPONENT_A_BIT; colorBlendAttachment.blendEnable = VK_FALSE; VkPipelineColorBlendStateCreateInfo colorBlending{}; colorBlending.sType = VK_STRUCTURE_TYPE_PIPELINE_COLOR_BLEND_STATE_CREATE_INFO; colorBlending.logicOpEnable = VK_FALSE; colorBlending.attachmentCount = 1 ; colorBlending.pAttachments = &colorBlendAttachment; colorBlending.blendConstants[ 0 ] = 0.0f ; colorBlending.blendConstants[ 1 ] = 0.0f ; colorBlending.blendConstants[ 2 ] = 0.0f ; colorBlending.blendConstants[ 3 ] = 0.0f ;
10.8 Pipeline Layout
The pipeline layout describes the descriptor sets and push constants that shaders can access:
VkDescriptorSetLayoutBinding uboLayoutBinding{}; uboLayoutBinding.binding = 0 ; uboLayoutBinding.descriptorType = VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER; uboLayoutBinding.descriptorCount = 1 ; uboLayoutBinding.stageFlags = VK_SHADER_STAGE_VERTEX_BIT | VK_SHADER_STAGE_FRAGMENT_BIT; VkDescriptorSetLayoutBinding samplerLayoutBinding{}; samplerLayoutBinding.binding = 1 ; samplerLayoutBinding.descriptorType = VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER; samplerLayoutBinding.descriptorCount = 1 ; samplerLayoutBinding.stageFlags = VK_SHADER_STAGE_FRAGMENT_BIT; std::array<VkDescriptorSetLayoutBinding, 2> bindings = { uboLayoutBinding, samplerLayoutBinding }; VkDescriptorSetLayoutCreateInfo layoutInfo{}; layoutInfo.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_SET_LAYOUT_CREATE_INFO; layoutInfo.bindingCount = static_cast < uint32_t >(bindings. size ()); layoutInfo.pBindings = bindings. data (); VkDescriptorSetLayout descriptorSetLayout; VK_CHECK ( vkCreateDescriptorSetLayout (device, &layoutInfo, nullptr , &descriptorSetLayout)); VkPushConstantRange pushConstantRange{}; pushConstantRange.stageFlags = VK_SHADER_STAGE_VERTEX_BIT | VK_SHADER_STAGE_FRAGMENT_BIT; pushConstantRange.offset = 0 ; pushC