Have your Iceberg Cubed, Not Sorted: Meet Qbeast, the OTree Spatial Index — Jack Vanlightly

So although the OTree introduces a tree-shaped, spatial index, the underlying Iceberg/Delta table remains standard (additional fields are added to metadata which does not break existing engines). Query engines simply ignore the OTree when they perform reads. Writers (optionally) and table maintenance jobs (obligatory) do need to know about the OTree, as we want the layout to be governed by an adaptive index rather than static partitioning logic.

Ideally writers will use the OTree index so that the index covers the whole dataset (ensuring locality is maintained from the very first moment data is written to the table). However, that requires that the writer, such as Apache Spark, to use the Qbeast module when performing writes. Table maintenance jobs must use the module, in order to apply the spatial layout to the Iceberg data files.

Although the OTree governs the layout of the entire table, the OTree itself is just lightweight metadata that describes the parent-child relationships (encoded in the cube ids), and for each cube: the element count and the min/max weights of each cube. I won’t go into the detail of weights, but it is an additional feature designed to enhance data distribution across the nodes. The normalized dimension bounds of each cube are established by the position of the cube in the tree, so there is no need to store that. Because of this, even a table with billions of rows can be represented by an OTree containing just a few thousand small metadata entries, typically amounting to a few megabytes in total. The tree is therefore cheap to store, fast to read, and easy to keep in memory, while still providing a view of the data’s spatial layout.

Final thoughts

It’s helpful to see all of this on a spectrum.

On the left, the classic B-tree clustered index: a strict, key-ordered global tree index that dictates exactly where every row lives. While great for selective OLTP workloads, it is far too rigid and expensive when the dataset grows and the queries become broad (reading millions of rows).

On the right, we have Iceberg/Delta’s approach: lightweight metadata describing the canonical set of files (without ordering), with a declared clustering strategy (partitioning and optional sort order) which the table is constantly drifting from, requiring maintenance bound that drift.

In the middle sits the OTree, it is a global tree index, but without the fine-grained rigidity of the B-tree. Instead of ordering individual rows, it divides the data space into coarse, adaptive regions that subdivide and merge as the distribution demands. This keeps it incredibly light while still determining where data should live. Dense data is located in narrow cubes and sparse data in wide cubes. The layout is self-correcting as data distribution changes, avoiding imbalanced partitions.

It’s fun to see the inversion of the role of the index. Using it to shape the table as it is written, so that the layout remains close to optimal, making the existing read-time optimizations of Iceberg and Delta more effective. The OTree is there behind the scenes and query engines that read from the tables have no idea that it exists.

There is a lot more to Qbeast than what I’ve covered here, there are additional mechanisms for ensuring even data distribution and making sampling efficient via random weights, but that’s too detailed for this post. The takeaway for me I suppose is that there are always more innovative ways of doing things, and we’re still early in the open table format / lakehouse game. There are plenty more innovations to come at all levels, from file formats, data organization, to query engines.