The Pagination Problem Nobody Talks About Until Production Burns – Bubble

Your /list endpoint works beautifully on day one. Response time: 12ms. The product team loves it. You ship it and move on.

Six months later, the table has 4 million rows. Page 1 still returns in 12ms. Page 800 takes 11 seconds. Page 1200 times out. A customer files a support ticket. Your DBA pulls the slow query log and shows you something you can’t unsee.

The query has been doing this the whole time — you just couldn’t tell until the data grew.

This post is about why OFFSET pagination is a trap, what to replace it with, and the genuinely hard problems that come up when you do.

What OFFSET Actually Does (And Why It Doesn’t Do What You Think)

Most engineers understand OFFSET conceptually — skip N rows, return the next M. What they don’t have is a physical model of what the database actually does to fulfill that instruction.

Here is what happens when you run:

SELECT * FROM orders ORDER BY created_at DESC LIMIT 50 OFFSET 10000;

The database:

Identifies rows matching your WHERE clause (if any). Sorts them by created_at DESC . Reads through 10,000 rows in order. Discards every single one of them. Returns the next 50.

Step 4 is not a skip. It is a scan. The database physically visits all 10,000 preceding rows and throws them away. There is no shortcut. The index can speed up the sort, but it cannot eliminate the traversal — because the database has no way to know which physical row corresponds to “position 10,000” without counting.

This is why OFFSET pagination has O(N) cost that grows linearly with the page number. Page 1 reads 50 rows. Page 200 reads 10,050 rows. Page 1,000 reads 50,050 rows. Your fastest users — the ones who never go past page 2 — will never notice. Your power users, your exporters, your data pipelines that walk the entire dataset will bring the database to its knees.

The part that’s often missed: this isn’t just slow — it’s resource-multiplying slow. Under concurrent load, each deep-page request holds a read lock or consumes buffer pool pages for its entire scan duration. Ten concurrent users on page 1,000 is ten simultaneous 50,000-row scans. OFFSET doesn’t degrade gracefully. It degrades explosively.

Cursor-Based Pagination: The Index Seek

The fix is well-known in theory. Its details in practice are where engineers run into trouble.

Instead of:

SELECT * FROM orders ORDER BY created_at DESC LIMIT 50 OFFSET 1000;

Use:

SELECT * FROM orders WHERE created_at < '2024-03-15 10:23:44' ORDER BY created_at DESC LIMIT 50;

The cursor encodes the last-seen value of the sort key. On the next page, you resume from where you left off. The database does an index seek directly to that position and reads forward 50 rows. No scanning, no discarding. Constant time regardless of how deep into the dataset you are.

This is the happy path. Now let’s talk about where it breaks.

The Non-Unique Sort Key Problem

The most common cursor-based pagination mistake: using a non-unique column as the cursor without accounting for ties.

Imagine sorting by status :

SELECT * FROM orders WHERE status > 'pending' ORDER BY status LIMIT 50;

status is not unique. Dozens of rows share the same value. When you paginate with WHERE status > last_seen_status , you skip over every row with exactly that status value. You silently lose data. There is no error. Users just never see those records.

The fix is a composite cursor. Always combine your sort key with a uniquely identifying column:

SELECT * FROM orders WHERE (created_at, id) < ('2024-03-15 10:23:44', 'ord_8f7d2a') ORDER BY created_at DESC, id DESC LIMIT 50;

This works because (created_at, id) is a deterministic, unique position in the result set. Even if 1,000 orders share the same created_at timestamp, their id values disambiguate them. Your composite index on (created_at, id) covers this query exactly.

The gotcha here: your API response must include both cursor components, and your client must send both. A cursor that is only the timestamp will silently corrupt pagination on any dataset with timestamp collisions. On a busy system, you will have timestamp collisions — guaranteed.

Encoding the Cursor

Exposing raw column values as cursors is a mistake that seems minor and isn’t.

If you return:

{ "next_cursor": "2024-03-15T10:23:44Z__ord_8f7d2a" }

You’ve leaked your sort column name, your ID format, your timestamp precision, and the fact that you use a composite cursor. Clients will parse it. Some will hardcode assumptions. When you change your pagination strategy — switching from timestamp to a sequence ID, or adding a third cursor component — you will break every client that ever decoded that string.

Use an opaque cursor. Base64-encode a small JSON object:

{ "v": 1, "ts": "2024-03-15T10:23:44Z", "id": "ord_8f7d2a" }

Base64 it. Return the string. The v (version) field lets you change the cursor format in the future while maintaining backward compatibility by decoding old cursors differently based on their version. This is not over-engineering — it’s the difference between a smooth migration and a breaking change.

Never put anything security-sensitive in a cursor without signing it. A cursor is user-controlled input. If your cursor encodes something like user_id or tenant_id to scope the query, a malicious client can craft a cursor that paginates another user’s data. Sign your cursors with HMAC or encrypt them.

Stable Pagination Under Writes

This is the hardest problem in cursor pagination and the one most implementations get wrong.

OFFSET pagination has a well-known issue: if a row is inserted between page 1 and page 2, a row gets shifted and appears twice (or disappears). Cursor pagination solves this for inserts after your cursor position — they don’t affect what you’ve already seen. But it introduces a subtler problem for inserts within your result window.

Scenario: a client fetches page 1 of orders sorted by created_at DESC . They get rows with timestamps between T-100s and T-0s. While they’re processing page 1, a high-priority order is created with created_at = T-5s (backdated due to a system clock issue or retry). When they fetch page 2, their cursor is at T-100s. They will never see the T-5s record because it falls between T-0s (past page 1) and T-100s (their cursor position). It’s in the gap.

This is fundamentally unsolvable with pure cursor pagination on mutable, live data without additional mechanisms. Your options are:

Snapshot isolation at the query level: take your cursor at a specific transaction ID or MVCC snapshot. PostgreSQL supports AS OF SYSTEM TIME and similar constructs. This is expensive for long pagination sessions.

take your cursor at a specific transaction ID or MVCC snapshot. PostgreSQL supports and similar constructs. This is expensive for long pagination sessions. Snapshot the result set: for export-style pagination over a bounded dataset, materialize the result into a temporary table or result cache at page 1. Subsequent pages read from the snapshot. Works well for batch exports; doesn’t work for real-time feeds.

for export-style pagination over a bounded dataset, materialize the result into a temporary table or result cache at page 1. Subsequent pages read from the snapshot. Works well for batch exports; doesn’t work for real-time feeds. Acknowledge and document the limitation: for most UIs, a record appearing in page 2 that a user already scrolled past on page 1 is not a disaster. For financial exports or data pipelines where completeness guarantees matter, you need one of the above.

for most UIs, a record appearing in page 2 that a user already scrolled past on page 1 is not a disaster. For financial exports or data pipelines where completeness guarantees matter, you need one of the above. Use a stable sequence instead of a timestamp. Auto-incrementing IDs or ULIDs (sortable by creation time) are immune to backdating. If your domain allows it, always prefer a monotonic sequence as the cursor column over a mutable timestamp.

Multi-Column Filtering With a Cursor

Real-world list endpoints don’t just paginate — they filter. And filter + cursor is where most implementations quietly break.

The naive approach:

SELECT * FROM orders WHERE customer_id = 'cust_123' AND created_at < '2024-03-15T10:23:44Z' ORDER BY created_at DESC LIMIT 50;

This looks correct. It works correctly. But if you don’t have a composite index on (customer_id, created_at DESC) , your database is scanning all orders for that customer, ordered by date, and resuming from the cursor. On a customer with 500,000 orders, that’s still slow.

The index for cursor pagination must include both your filter columns and your sort columns, in that order. Filter columns first (equality predicates), sort columns last. This is the covering index pattern for pagination:

CREATE INDEX idx_orders_customer_created ON orders (customer_id, created_at DESC, id DESC);

Now your query touches only the rows matching customer_id = 'cust_123' and scans forward from the cursor in O(returned rows) time. Without this index shape, you have cursor pagination that is still slow — just slow in a different way.

The variable sort problem: what if your API lets clients sort by different columns? Sort by amount , status , customer_name , created_at ? Each sort order requires its own index shape for cursor pagination to be efficient. You cannot use one cursor for all sort columns. The cursor must encode the values of whichever column is currently the sort key.

If your /list endpoint supports 5 sortable columns, you need 5 composite indexes and 5 cursor formats. This is not hypothetical complexity — it’s the real cost of a flexible list API. Many teams hit this and retreat to OFFSET with a note that “cursor pagination doesn’t work for our use case.” What they actually discovered is that cursor pagination has upfront index design cost that OFFSET lazily defers until production.

The Count Problem

Every paginated UI wants a total count. “Showing 1–50 of 4,231 results.” It sounds trivial. It is not.

SELECT COUNT(*) FROM orders WHERE customer_id = 'cust_123' hits the same index as your paginated query and is usually fast if the filter selectivity is high. But SELECT COUNT(*) FROM orders WHERE status = 'active' on a table where 90% of rows are active is a full index scan of millions of rows — just to return a single number that most users will ignore.

The non-obvious cost of counts: PostgreSQL cannot return COUNT(*) from the index alone for a filtered query — it must visit every matching row to verify it still exists (MVCC visibility check). In a high-write table, this is slow in ways that don’t correlate with table size. A count on a table with heavy deletes or updates will be dramatically slower than one on an append-only table of the same row count, because MVCC dead rows must be traversed.

The patterns that actually work:

Make count optional. Only compute it when the client explicitly requests it: ?include_count=true . Default to omitting it. Most UIs need “has more pages” (a boolean), not an exact total.

Only compute it when the client explicitly requests it: . Default to omitting it. Most UIs need “has more pages” (a boolean), not an exact total. Return a “has_more” flag instead: run LIMIT 51 , return 50 results, set has_more = true if you got 51. Zero additional database work.

run , return 50 results, set if you got 51. Zero additional database work. Estimate for large tables: PostgreSQL’s pg_class.reltuples gives a fast approximate count. For a “4,231 results” display, showing “~4,200 results” is almost always acceptable. Use exact counts only when precision is a business requirement.

PostgreSQL’s gives a fast approximate count. For a “4,231 results” display, showing “~4,200 results” is almost always acceptable. Use exact counts only when precision is a business requirement. Async counts: compute the count in a background job and cache it. Return it from cache with a staleness indicator. Users get their list immediately; the count appears 200ms later. This mirrors how Elasticsearch and most search engines handle it.

compute the count in a background job and cache it. Return it from cache with a staleness indicator. Users get their list immediately; the count appears 200ms later. This mirrors how Elasticsearch and most search engines handle it. Never run COUNT on every paginated request by default. That’s two queries instead of one, and the second one often costs more than the first.

Keyset Pagination vs. Cursor Pagination: They’re Not the Same Thing

These terms are used interchangeably, but they describe subtly different things. Getting the distinction right matters when you’re building an API that others will integrate against.

Keyset pagination uses the actual values of your sort/filter columns as the continuation point:

GET /orders?after_created_at=2024-03-15T10:23:44Z&after_id=ord_8f7d2a

Cursor pagination encodes that information into an opaque token:

GET /orders?cursor=eyJ2IjoxLCJ0cyI6IjIwMjQtMDMtMTVUMTA6MjM6NDRaIiwiaWQiOiJvcmRfOGY3ZDJhIn0=

Keyset pagination exposes your schema. Cursor pagination hides it. Keyset pagination allows clients to construct their own continuation points (which can be a feature or a vulnerability). Cursor pagination treats the continuation token as a server-side black box.

For internal APIs and data pipelines where the client and server are developed together: keyset is fine, simpler to debug, easier to log and reproduce. For public APIs that third-party developers will integrate: use opaque cursors, version them, and treat them like you’d treat a serialization format with a long migration tail.

Seek Method in SQL: The Portable Pattern

When you have a composite cursor, the SQL gets awkward:

WHERE (created_at < '2024-03-15T10:23:44Z') OR (created_at = '2024-03-15T10:23:44Z' AND id < 'ord_8f7d2a')

This OR form correctly handles the row-value comparison, but many query builders and ORMs don’t generate it correctly. The clean, SQL-standard version is a row value comparison:

WHERE (created_at, id) < ('2024-03-15T10:23:44Z', 'ord_8f7d2a')

PostgreSQL handles this well and can use a composite index on (created_at, id) for this comparison. MySQL supports row value comparisons syntactically but has historically had optimizer issues that prevent it from using composite indexes for the comparison correctly — always verify with EXPLAIN in MySQL. SQLite handles it correctly.

If you’re using an ORM that doesn’t support row-value comparisons, you’ll need to write raw SQL for the cursor predicate. Don’t let the ORM’s limitations push you back to OFFSET. Write the raw query. Wrap it in a repository function. Test it.

When OFFSET Is Actually Fine

Cursor pagination has real costs: added complexity, index design upfront, opaque tokens to manage, no random-page access. OFFSET has a legitimate use in several scenarios:

Small, bounded datasets. If your list endpoint will never return more than 500 total rows — a settings page, a list of team members, a dropdown of countries — the OFFSET cost is negligible and the implementation simplicity is worth it.

User-facing “go to page N” navigation. Cursor pagination is inherently sequential. If your UI has a page number input where users type “47” to jump directly to page 47, cursor pagination can’t serve that natively. You’d have to precompute 46 cursor positions — impractical. OFFSET handles random-access pagination directly. The question is whether your dataset and access patterns can tolerate the cost.

Infrequent, low-concurrency admin interfaces. An internal admin tool that runs once a day, used by 3 people, paginating through 100,000 rows: the OFFSET cost is real but the impact is acceptable. Optimize for the code you write, not hypothetical scale you might never reach.

Reporting queries with bounded result sets. If your report is always filtered to a single customer’s 90-day history and that customer never has more than 2,000 records, OFFSET to page 40 is 2,000 rows scanned. That might be a 5ms query. The engineering cost of cursor pagination here outweighs the runtime savings.

The mistake isn’t using OFFSET — it’s using OFFSET without understanding its cost curve and not having a plan for when that curve becomes a problem.

A Practical Migration Path

If you have a live API with OFFSET pagination and need to migrate:

Step 1 — Add cursor support without removing OFFSET. Let clients opt in with a cursor parameter. If provided, use keyset. If absent, fall back to OFFSET. This gives new clients the fast path without breaking existing integrations.

Step 2 — Add the necessary composite indexes first, before enabling cursor pagination in production. A cursor-based query without the right index can be slower than OFFSET on small datasets.

Step 3 — Version the cursor format from day one. Even if your first cursor is simple, include a version field. You will change the sort columns, add encryption, or switch ID formats eventually.

Step 4 — Deprecate OFFSET with a sunset timeline, not an immediate removal. Log which clients are still using OFFSET. Contact them. Give a 6-month window. Turn it off.

Step 5 — Harden the cursor against tampering. Add HMAC signing before making the cursor endpoint public. An unsigned cursor on a multi-tenant API is an authorization bypass waiting to be found.

The Takeaway

OFFSET pagination isn’t a beginner mistake. It’s a default that made sense when datasets were small, that scales catastrophically as they grow, and that gives no warning until it’s already hurting production. The slow /list endpoint is almost always an OFFSET endpoint on a table that grew beyond what anyone anticipated.

Cursor pagination isn’t magic either. It requires composite indexes, careful cursor encoding, explicit handling of non-unique sort keys, and honest acknowledgment of what it can’t do — stable pagination over live mutations, random page access, and total counts on large filtered datasets.

The production-grade approach is to understand both tools, default to cursors for any list endpoint that touches more than a few thousand rows, design your indexes before writing the query, and treat the cursor as a versioned API artifact with the same care you’d give a response schema.