The philosophy of brinicle

Building production-oriented vector search solutions

Nowadays, vector search is becoming a common component in many products: site search, recommendations, semantic autocomplete, support tooling, and AI agents that retrieve the right chunks before calling an LLM. The implementation choices change a lot as the data size grows. With a few thousand vectors, an exact k-NN scan can be perfectly fine. Once you move into larger collections, approximate nearest neighbor (ANN) indexing becomes the practical approach. You build an index, persist it, and query it efficiently.

At that point, many teams reach for a full-featured vector database because it bundles ANN with a service layer. That bundle is valuable when you need it. It also comes with a baseline overhead: extra moving parts, background processes, configuration surface area, and memory overhead that is often "always on" even for small-to-mid sized datasets. If you are deploying in tight containers, edge machines, or low-cost instances, the baseline matters as much as raw search speed.

So the first question is "what do I actually need to run in production" instead of "which system is fastest". If you need pre-filtering, rich metadata, payload indexing, authentication, replication, multi-tenancy, and operational tooling, and you are operating at tens of millions of vectors, then a full vector database is usually the right choice.

A lot of real systems sit in a different zone. They need fast ANN search on a dataset no more than 10M vectors, plus the core lifecycle operations: insert, upsert, delete, and periodic rebuild/compaction. They already have a metadata store, so duplicating that layer inside a vector DB is redundant. In that setup, a full DB can feel like paying in RAM and operational complexity for features that aren't used.

Building an index engine from scratch is also a rarely worth it for most teams. It's time-consuming, and it pulls attention away from the core product. The usual alternative is in-process libraries such as FAISS and hnswlib. They are quite fast, with great accuracy, yet they often push you toward a RAM-first model where large portions of the index and vectors live in memory. In some cases, they consume more RAM than a full vector database. Production details like persistence workflow, safe mutation, concurrency, predictable memory growth should also be written on top of them.

brinicle: Disk-First ANN Indexing for Low-RAM Vector Search

brinicle Vector Engine targets this gap: a production-oriented ANN index engine designed to stay usable under strict resource budgets. It focuses on disk-first operation and low memory overhead, while still supporting the operations you typically need in a real service: build/load, search, insert/upsert/delete, and rebuild.

brinicle is an open source C++ vector index engine for approximate nearest neighbor search. It is built for disk-first operation and low-RAM environments. The goal is simple: keep RAM usage predictable, keep tail latency stable, and still hit high recall.

If your dataset is in the sub-10M range and your main constraint is resources (RAM caps, small instances, dense multi-tenant packing), or you're deploying an agent on a 512MB container and only need ANN + CRUD, brinicle is meant to give you the index layer you need without forcing you to adopt a full vector database.

brinicle supports:

  • Building and loading indexes
  • Parallel insert, upsert, delete, and rebuild
  • Safe search

It also ships with a Python wrapper (pybind), so you can use it directly from Python.

What brinicle is, and what it is not

brinicle is an index engine. You embed it in a service or pair it with your own metadata store.

brinicle is not a vector database. It does not aim to provide database features like filtering, payload indexing, distributed replication, auth, or multi-tenancy. If you need those features, use a vector database.

This separation is intentional. The benchmarks show why: a full DB stack often has a baseline memory footprint that is not compatible with extreme RAM caps, even before you start tuning.

When brinicle is a fit

  • You're under 10M vectors and already have a metadata store
  • You must run in tight RAM (≤1–2GB) or pack many tenants per node
  • You want ANN + CRUD + rebuild/compaction, not DB features

When a vector DB is the right tool

  • You need filtering/payload indexing as part of retrieval
  • You need replication, auth, multi-tenancy, operational UI/tooling
  • You're operating at large scale (tens/hundreds of millions) and want a managed service

What you trade: Lower baseline RAM / simpler stack in exchange for bringing your own metadata + service layer.

Project Links