A reranker is a model that takes a query and a set of candidate documents and reorders them by relevance. It usually runs as a second stage after fast retrieval, scoring each query–document pair jointly with a cross-encoder for higher precision.

Why use a rerank model in RAG?

Vector search returns roughly relevant chunks, but the order is often imperfect. A rerank model re-scores the top candidates so the most relevant passages land at the top of the prompt, improving answer quality and letting you send fewer tokens to the LLM.

Reranker · rerank model · rerank for RAG

Understand rerankers. Then watch one run in your browser.

A reranker re-scores your retrieved candidates so the most relevant passages rise to the top. Learn how it works, compare the popular models, and try a cross-encoder live — with zero API cost and nothing leaving your machine.

▶ Try the live demo Start with the basics

Runs on transformers.js · 100% client-side · no key required

Start here

Three short guides take you from “what is a reranker?” to a working reranking stage in your RAG pipeline.

🧭

What is a reranker?

The two-stage retrieval pattern, why order matters, and where reranking fits.

Fundamentals

⚖️

Cross-encoder vs bi-encoder

Why bi-encoders are fast and cross-encoders are accurate — and how to use both.

Architecture

🔧

How to add reranking to RAG

Retrieve wide, rerank, keep the best. With code, top-k tips and latency trade-offs.

Practical

The fun part

Rerank in your browser, right now

Paste a query and a few candidate passages. A real cross-encoder downloads once, caches, and scores every pair locally — you watch the ranking reshuffle in milliseconds. No server, no API key, no data leaving the page.

Real model weights via transformers.js + ONNX Runtime Web
Zero API cost and zero abuse risk — it’s all on your device
See exactly how scores reorder your retrieval results

Open the demo →

Compare the rerank models

Hosted APIs and open-weight models, side by side — quality, latency, languages and cost.

See the full comparison →

Reranking in one diagram

query ─┐
       ▼
┌───────────────┐   top 100   ┌────────────────┐   top 5   ┌─────┐
│  Retriever    │────────────▶│   Reranker     │──────────▶│ LLM │
│ (bi-encoder / │  candidates │ (cross-encoder │  best few │     │
│  BM25, fast)  │             │  scores pairs) │           └─────┘
└───────────────┘             └────────────────┘
  recall-oriented               precision-oriented

Retrieve wide for recall, rerank for precision, send only the best to the model.

Understand rerankers. Then watch one run in your browser.

Start here

What is a reranker?

Cross-encoder vs bi-encoder

How to add reranking to RAG

Rerank in your browser, right now

Compare the rerank models

bge-reranker

Cohere Rerank

Jina Reranker

Voyage Rerank

Reranking in one diagram