Skip to content

docs: add guidance on IndexDetails specification vs build strategy#6264

Draft
wjones127 wants to merge 1 commit intolance-format:mainfrom
wjones127:docs-discuss-index-details
Draft

docs: add guidance on IndexDetails specification vs build strategy#6264
wjones127 wants to merge 1 commit intolance-format:mainfrom
wjones127:docs-discuss-index-details

Conversation

@wjones127
Copy link
Contributor

Adds a new "Index Details" section to the index overview page explaining
the distinction between two categories of parameters in index_details
protobuf messages:

  • Index specification — defines the target structure; determines
    query-time behavior; must be decoupled from dataset size; must fully
    describe an empty index
  • Build strategy — controls the construction process; irrelevant
    to query execution; should not be stored in index_details

Includes examples and a note on where to store build strategy parameters
if auditability is needed.

Looking for feedback on whether these criteria are the right ones and
whether the examples are well-chosen.

Explains the distinction between index specification parameters (what
was built, relevant at query time) and build strategy parameters (how
it was built, irrelevant to readers). Includes guidance on designing
spec parameters to be decoupled from dataset size, and the requirement
that spec parameters fully describe an empty index.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Mar 23, 2026
@github-actions
Copy link
Contributor

Review

Clean, well-structured documentation. Two substantive points:

1. The num_partitions vs target_partition_size example may blur the spec/strategy line

The doc argues num_partitions is a worse spec parameter than target_partition_size because it's dataset-size-dependent. But num_partitions directly determines query-time behavior (which partitions to probe, nprobe semantics, recall characteristics) — making it firmly a specification parameter by the doc's own criteria. The real issue is that target_partition_size is a build-time intent that gets resolved to a concrete num_partitions during construction.

This risks confusing implementers: is num_partitions spec or strategy? By the four criteria listed, it's clearly spec (query engine needs it, affects recall, affects storage layout). Consider reframing the example to say that target_partition_size is better as the user-facing configuration that gets materialized into a concrete num_partitions in the spec, rather than implying num_partitions doesn't belong in index_details.

2. "codebook data" as a spec parameter example

Codebook data can be substantial (e.g., PQ codebooks for 256 centroids × 64 subspaces × 4 bytes = 64KB+). Since index_details lives in the manifest protobuf, storing large binary blobs there could bloat manifest reads. Worth a note clarifying whether large derived data like codebooks should be stored in index_details or referenced from index files, with only the codebook parameters (num centroids, subspaces, etc.) in the spec.


No issues with formatting, structure, or placement in the doc. The spec-vs-strategy distinction is a valuable addition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant