Vector Database Comparison: pgvector, Qdrant, Milvus in 2026
The vector database decision is now operational, not architectural. The three open-source options — pgvector, Qdrant, Milvus — all work for the workloads our customers run. The right pick depends on scale, the customer's existing ops capability, and which features they actually use. Here's the decision framework we apply on every engagement.
We've now deployed RAG into production at 47 enterprise customers across BFSI, healthcare, government and pharma. Across those engagements we've run pgvector at 31 customers, Qdrant at 12, Milvus at 4 — the distribution mirrors our scale rule rather than any hierarchical preference. The most frequent question in the architecture-review session is which one. The honest answer we give every CIO is that the decision is operational, not performance-driven. Three years into the enterprise RAG era, the vector-database choice has matured from architecture decision to operations decision. The three production-grade open-source options — pgvector, Qdrant, Milvus — all serve the workloads our customers run. None is clearly better than the others on first-order performance benchmarks at the scales most enterprises operate at. The wrong question is "which is fastest"; the right question is "which is least operational work for our team to run." Here is the decision framework we apply on every engagement, written as the rule we'd give a CIO who's evaluating the three at 11pm before tomorrow's investment committee.
The decision rule we actually apply
Under 10M chunks: pgvector. Between 10M and 100M chunks: Qdrant. Above 100M chunks: Milvus or distributed Qdrant. Below 10M chunks pgvector wins because operational simplicity dominates — most of our enterprise customers already run Postgres, the team already knows how to back it up, monitor it, and version-control its schema. Adding pgvector to an existing Postgres deployment is a one-line extension activation; adding a separate vector database is one more system to back up, monitor, alert on, secure, and explain to the customer's operations team. Below the scale where dedicated vector databases start to materially outperform Postgres, the operational tax isn't worth it.
Why pgvector wins below 10M
Three reasons. One: it's already in your infrastructure. Postgres is the de facto enterprise relational database; pgvector is a Postgres extension that ships with most managed Postgres offerings and is trivial to install on self-hosted ones. Two: ACID + vectors. The team can join vector search results to other tables in the same query, in the same transaction, with no cross-system consistency considerations. Three: the operations team already knows it. Backups, replication, failover, monitoring, security — every operational pattern Postgres has accumulated over 30 years applies. The features pgvector is missing relative to Qdrant or Milvus (advanced payload filtering, GPU-accelerated indexing, sharding at very large scale) don't matter below 10M chunks for the workloads our customers run.
When to graduate to Qdrant
Qdrant's sweet spot in our deployments is 10–100M chunks with heavy payload filtering. "Heavy payload filtering" means the customer's queries are routinely filtered by attribute (document type, business unit, jurisdiction, effective date) before vector search runs. Qdrant's filtered HNSW implementation is materially faster than pgvector's at this scale and with this query pattern. Qdrant's snapshot-and-restore is faster and cleaner than pg_dump on a large vector index, which matters operationally. The Rust runtime is light on memory and fast to start. For customers where the vector workload is a meaningful share of their AI compute, Qdrant's smaller operational footprint earns its way in.
When to graduate to Milvus
Milvus is the choice above 100M chunks, or where the customer needs GPU-accelerated indexing for very high write rates, or where the deployment requires multi-cluster / multi-region replication out of the box. Milvus is heavier to operate than Qdrant — there are more moving parts (etcd, MinIO, multiple coordinator processes), the cluster setup has more failure modes, and the operational documentation requires more attention. At the scales where Milvus is the right answer, the customer typically has dedicated platform operations and the additional complexity is acceptable; at smaller scales the complexity tax is real.
Features that matter less than the benchmarks suggest
Three feature differences appear prominently in vendor comparisons but rarely matter in our production deployments. Query latency at sub-100ms tiers — every production option achieves this on reasonable hardware at the scales we deploy at, and the latency differences between options are dwarfed by retrieval-ranking and LLM inference latencies. Recall at very high top-K — most production RAG pipelines retrieve top-30 and re-rank to top-5; the recall differences between options at K=30 are negligible. Hybrid sparse-dense fusion — every option now supports this, the algorithmic implementations are different in detail but produce broadly similar quality.
Features that matter more than benchmarks suggest
Three operational features matter more than feature comparisons usually highlight. Payload-filter performance at the actual filter cardinalities your queries use — if your queries always filter by document_type and document_type has 8 distinct values, you want the database whose filtered query path is fast at that cardinality, which isn't always the database that wins benchmarks at less realistic distributions. Backup and restore time — when you need to restore a 50M-vector index for compliance reasons, the difference between a 30-minute snapshot restore and a 6-hour pg_dump replay is operationally decisive. Schema and index migration semantics — when the customer wants to change the embedding dimension or the index parameters, the database whose migration is online + safe matters substantially more than the one with marginally better benchmarks.
The practical recommendation
Start on pgvector unless you have a clear reason not to. Migrate to Qdrant when you cross 10M chunks AND have heavy payload-filter requirements. Migrate to Milvus only at very high scale or specific feature needs. Don't over-engineer the choice upfront — the migration from pgvector to Qdrant is straightforward (the embedding model is unchanged, the data export is one query, the retrieval API is abstracted behind a single interface in any sensible RAG architecture), so the cost of changing your mind later is bounded. See /enterprise-rag for the full RAG stack architecture and /glossary#vector-database for the underlying concepts.
MindMap Engineering
MindMap Engineering is the collective practice behind 117 production-deployed AI accelerators across BFSI, healthcare, government, retail and telecom. The pieces published here are written by the engineering leads who shipped the systems they describe — sovereign LLM platforms, RAG pipelines, agentic workflows, IDP systems — at customer sites across three continents. We don't write about architectures we haven't deployed.
- ✓117 production-deployed AI accelerators
- ✓50+ enterprise customers across BFSI, healthcare, government
- ✓Deployments live across India, UK, EU, Gulf, North America, Africa
- ✓Sovereign deployment as the default architectural pattern
- ✓Langfuse + RAGAS + vLLM + Qdrant production experience
Keep reading
The 2026 Sovereign AI Architecture Report
Data-driven analysis of every meaningful sovereign AI stack in production today. Compares 6 open-weights model families, 4 vector databases, 3 inference servers and 5 reference architectures on cost-per-million-tokens, regulator-readiness, integration substrate and operational complexity. Survey-based, with the deployment numbers from 50+ regulated-industry engagements behind every recommendation.
State of Agentic AI in Regulated Industries 2026
A production-pattern survey of agentic AI in BFSI, healthcare, public sector and pharma. What patterns actually ship (ReAct + tool-use, planner-executor, multi-agent orchestration), what fails in audit (silent loops, hidden tool calls, unbounded reasoning), and the four engineering controls separating prototypes from production. Based on the agent runtimes we've shipped at 17 regulated customers in the past 18 months.
EU AI Act Readiness Benchmark — 50 Enterprises
Anonymised readiness benchmark across 50 enterprises with EU exposure — banks, insurers, hospitals, manufacturers, public-sector bodies — measured against the 11 Articles 9–15 evidence requirements. Median readiness is 38%; only 14% would survive a supervisory audit today. Where the gaps cluster, why they're tractable in 90 days, and the five interventions that close the most ground.
Ready to apply these ideas?
Talk to our engineering team. No sales pitch — just a technical conversation.
Start a conversation →