LLM Systems
Vector Databases Compared - pgvector, Qdrant, Milvus, Weaviate, LanceDB
A practitioner's guide to picking a vector store, weighing index trade-offs against the operational cost of running yet another database alongside your primary store.
intermediate · 9 min read
The vector database market spent 2023-2024 colonising every infrastructure team's roadmap. Most of them did not need a new database; they needed an index on top of the one they already had. The serious question is not "which vector DB is fastest" - all the mature options are within a small constant factor of each other on the ANN benchmarks - but "what is the smallest piece of infrastructure that solves my retrieval problem for the next two years."
What the five contenders actually are
| Database | Architecture | Strongest at | Operational shape |
|---|---|---|---|
| pgvector | Postgres extension | Hybrid SQL + vector workloads, <10M vectors | Zero new infra - it's already Postgres |
| Qdrant | Standalone Rust server | Dedicated vector search, payload filtering | One more service, one more set of backups |
| Milvus | Distributed cloud-native | Billion-scale, high-QPS, horizontal scaling | Heaviest - separate coordinator, query, data nodes |
| Weaviate | Standalone Go server with GraphQL | Schema-driven RAG, native hybrid search | Mid-weight, opinionated data model |
| LanceDB | Embedded columnar (Lance format) | Edge, notebooks, ML workflows on object storage | No server - library that reads parquet-like files |
HNSW vs IVF - the index trade-off everyone hits
Every mature vector DB ships at least HNSW (Hierarchical Navigable Small World) and an IVF (Inverted File) variant. The choice matters more than the database choice for most teams.
HNSW. A multi-layer proximity graph (Malkov & Yashunin 2016). Excellent recall-vs-latency, no training phase, supports dynamic inserts. Costs: high memory (the graph lives in RAM, typically 1.5-3x the raw vector bytes) and slow builds. This is the right default for under 50M vectors with low write rates.
IVF (IVFFlat, IVF_PQ). Cluster the corpus into N partitions, search the top nprobe partitions at query time. Cheap to build, low memory, easy to shard. Recall drops with nprobe, and you need enough data to cluster meaningfully (sqrt(rows) lists is the rule of thumb). Right for 50M+ vectors where HNSW's memory cost stops fitting on a sensible box.
Product Quantisation (PQ) on top. Both index families can compress vectors via PQ - split each vector into M sub-vectors and replace each with a codebook id. 8-32x memory reduction, 1-5 points of recall loss. Standard at billion scale.
# Picking an index, in pseudo-code
if vectors < 10_000_000 and write_qps < 100:
index = "HNSW" # default - recall and latency win
elif vectors < 100_000_000:
index = "IVF_FLAT" # build cheap, search OK
else:
index = "IVF_PQ" # only thing that fits the budget
When each one makes sense
pgvector. If you already run Postgres, this is your starting point. One extension, transactional inserts, joins against your existing tables, and your DBA already knows how to back it up. The cost is honesty about scale: HNSW in pgvector is slower to build than dedicated engines, and you compete with OLTP traffic for the same shared buffers. Comfortable up to ~10M vectors and ~100 QPS. Beyond that, the dedicated stores pull ahead.
Qdrant. The right next step when pgvector stops fitting. Rust server, payload filtering that runs alongside the HNSW traversal (not as a post-filter), and a clean REST/gRPC API. Quantisation (scalar and binary) is first-class. Operationally a single-binary service with a snapshot story that does not require a separate cluster. The sweet spot is "we need a real vector DB but we are not Spotify."
Milvus. The right answer at billion-scale or when you have multiple teams sharing one vector platform. Disaggregated architecture - coordinator, query nodes, data nodes, separate object store for cold data - so you scale read and write paths independently. The price is operational weight: you are running a small distributed system. Do not pick Milvus to serve a 5M-vector RAG demo.
Weaviate. Strongest when you want a schema, native hybrid search (BM25 + vector + RRF in one query), and built-in modules for embedding generation. The GraphQL API is divisive; the data model is opinionated. Picks itself when your retrieval problem has rich filters and you want one query language for both lexical and semantic.
LanceDB. The embedded option. The Lance columnar format sits in object storage (S3, GCS, local disk), and the LanceDB client reads it directly. No server to run. Perfect for notebook workflows, edge deployments, and ML pipelines where the "database" is just versioned files. Stops fitting the moment you need high-concurrency online serving.
The decision table
| You have | Pick | Why not the others |
|---|---|---|
| Postgres in production, <10M vectors | pgvector | Adding a second DB is the most expensive optimisation you can do |
| Dedicated vector workload, 10M-500M vectors | Qdrant | Milvus is overkill; pgvector starts to struggle |
| Billion+ vectors or multi-tenant platform | Milvus | The only one that scales the read and write paths independently |
| RAG with heavy filters + hybrid lexical | Weaviate | RRF and BM25 are native, not bolted on |
| Edge / notebook / ML pipeline | LanceDB | The only one that does not need a server |
| Need pgvector but with HNSW at scale | pgvecto.rs or Qdrant | pgvector's HNSW is fine until it isn't |
The operational tax nobody mentions
Every dedicated vector DB you add is:
- Another service in your monitoring stack.
- Another set of backups, with a different restore procedure than your primary DB.
- Another upgrade path - vector DBs are young and breaking changes still happen.
- Another consistency story - syncing the source of truth (Postgres) to the vector index requires either dual-writes, CDC, or a periodic reindex job. All three have failure modes.
If your team is five engineers and you do not yet have a clear scaling pain, the right answer is almost always "use pgvector and revisit when you cross 10M rows or 200 ms p95."
When it falls down
- You believe the ANN benchmarks at face value. They run on uniform synthetic data with no filters and no concurrent writes. Your workload looks nothing like that. Always benchmark on a sample of your real corpus with realistic filters.
- You under-budget memory for HNSW. A 10M-vector, 1024-dim corpus is 40 GB raw, plus the HNSW graph - count on 100+ GB resident. OOM in production is not a feature.
- You forget recall is a function of
ef_search/nprobe. Default settings often give 85-90% recall, which sounds fine until your users see the 10-15% missing. Tune for your recall target before declaring victory.
Further reading
- pgvector on GitHub - the extension itself, with the HNSW vs IVFFlat trade-off table in the README.
- Qdrant documentation - clean docs, especially the filtering and quantisation sections.
- Milvus on GitHub - architecture diagrams worth studying before you commit.
- Weaviate developer docs - native hybrid search and schema-driven modules.
- Efficient and robust approximate nearest neighbor search using HNSW - Malkov & Yashunin - the original HNSW paper.
- Pinecone HNSW explainer - readable visual walkthrough of how the graph layers work.