← Concept library

LLM Systems

Vector Databases Compared - pgvector, Qdrant, Milvus, Weaviate, LanceDB

A practitioner's guide to picking a vector store, weighing index trade-offs against the operational cost of running yet another database alongside your primary store.

intermediate · 9 min read

The vector database market spent 2023-2024 colonising every infrastructure team's roadmap. Most of them did not need a new database; they needed an index on top of the one they already had. The serious question is not "which vector DB is fastest" - all the mature options are within a small constant factor of each other on the ANN benchmarks - but "what is the smallest piece of infrastructure that solves my retrieval problem for the next two years."

What the five contenders actually are

Database Architecture Strongest at Operational shape
pgvector Postgres extension Hybrid SQL + vector workloads, <10M vectors Zero new infra - it's already Postgres
Qdrant Standalone Rust server Dedicated vector search, payload filtering One more service, one more set of backups
Milvus Distributed cloud-native Billion-scale, high-QPS, horizontal scaling Heaviest - separate coordinator, query, data nodes
Weaviate Standalone Go server with GraphQL Schema-driven RAG, native hybrid search Mid-weight, opinionated data model
LanceDB Embedded columnar (Lance format) Edge, notebooks, ML workflows on object storage No server - library that reads parquet-like files

HNSW vs IVF - the index trade-off everyone hits

Every mature vector DB ships at least HNSW (Hierarchical Navigable Small World) and an IVF (Inverted File) variant. The choice matters more than the database choice for most teams.

HNSW. A multi-layer proximity graph (Malkov & Yashunin 2016). Excellent recall-vs-latency, no training phase, supports dynamic inserts. Costs: high memory (the graph lives in RAM, typically 1.5-3x the raw vector bytes) and slow builds. This is the right default for under 50M vectors with low write rates.

IVF (IVFFlat, IVF_PQ). Cluster the corpus into N partitions, search the top nprobe partitions at query time. Cheap to build, low memory, easy to shard. Recall drops with nprobe, and you need enough data to cluster meaningfully (sqrt(rows) lists is the rule of thumb). Right for 50M+ vectors where HNSW's memory cost stops fitting on a sensible box.

Product Quantisation (PQ) on top. Both index families can compress vectors via PQ - split each vector into M sub-vectors and replace each with a codebook id. 8-32x memory reduction, 1-5 points of recall loss. Standard at billion scale.

# Picking an index, in pseudo-code
if vectors < 10_000_000 and write_qps < 100:
    index = "HNSW"            # default - recall and latency win
elif vectors < 100_000_000:
    index = "IVF_FLAT"        # build cheap, search OK
else:
    index = "IVF_PQ"          # only thing that fits the budget

When each one makes sense

pgvector. If you already run Postgres, this is your starting point. One extension, transactional inserts, joins against your existing tables, and your DBA already knows how to back it up. The cost is honesty about scale: HNSW in pgvector is slower to build than dedicated engines, and you compete with OLTP traffic for the same shared buffers. Comfortable up to ~10M vectors and ~100 QPS. Beyond that, the dedicated stores pull ahead.

Qdrant. The right next step when pgvector stops fitting. Rust server, payload filtering that runs alongside the HNSW traversal (not as a post-filter), and a clean REST/gRPC API. Quantisation (scalar and binary) is first-class. Operationally a single-binary service with a snapshot story that does not require a separate cluster. The sweet spot is "we need a real vector DB but we are not Spotify."

Milvus. The right answer at billion-scale or when you have multiple teams sharing one vector platform. Disaggregated architecture - coordinator, query nodes, data nodes, separate object store for cold data - so you scale read and write paths independently. The price is operational weight: you are running a small distributed system. Do not pick Milvus to serve a 5M-vector RAG demo.

Weaviate. Strongest when you want a schema, native hybrid search (BM25 + vector + RRF in one query), and built-in modules for embedding generation. The GraphQL API is divisive; the data model is opinionated. Picks itself when your retrieval problem has rich filters and you want one query language for both lexical and semantic.

LanceDB. The embedded option. The Lance columnar format sits in object storage (S3, GCS, local disk), and the LanceDB client reads it directly. No server to run. Perfect for notebook workflows, edge deployments, and ML pipelines where the "database" is just versioned files. Stops fitting the moment you need high-concurrency online serving.

The decision table

You have Pick Why not the others
Postgres in production, <10M vectors pgvector Adding a second DB is the most expensive optimisation you can do
Dedicated vector workload, 10M-500M vectors Qdrant Milvus is overkill; pgvector starts to struggle
Billion+ vectors or multi-tenant platform Milvus The only one that scales the read and write paths independently
RAG with heavy filters + hybrid lexical Weaviate RRF and BM25 are native, not bolted on
Edge / notebook / ML pipeline LanceDB The only one that does not need a server
Need pgvector but with HNSW at scale pgvecto.rs or Qdrant pgvector's HNSW is fine until it isn't

The operational tax nobody mentions

Every dedicated vector DB you add is:

  • Another service in your monitoring stack.
  • Another set of backups, with a different restore procedure than your primary DB.
  • Another upgrade path - vector DBs are young and breaking changes still happen.
  • Another consistency story - syncing the source of truth (Postgres) to the vector index requires either dual-writes, CDC, or a periodic reindex job. All three have failure modes.

If your team is five engineers and you do not yet have a clear scaling pain, the right answer is almost always "use pgvector and revisit when you cross 10M rows or 200 ms p95."

When it falls down

  • You believe the ANN benchmarks at face value. They run on uniform synthetic data with no filters and no concurrent writes. Your workload looks nothing like that. Always benchmark on a sample of your real corpus with realistic filters.
  • You under-budget memory for HNSW. A 10M-vector, 1024-dim corpus is 40 GB raw, plus the HNSW graph - count on 100+ GB resident. OOM in production is not a feature.
  • You forget recall is a function of ef_search / nprobe. Default settings often give 85-90% recall, which sounds fine until your users see the 10-15% missing. Tune for your recall target before declaring victory.

Further reading