bge-reranker
Open weightsSelf-hostMultilingual
The bge-reranker family from BAAI (Beijing Academy of Artificial Intelligence) are open-weight cross-encoders trained specifically for passage reranking. They're the standard self-hosted option: free to download, drop into sentence-transformers, and strong enough to beat most commercial APIs on English benchmarks.
On this page
Model variants
| Model | Size | Languages | Best for |
|---|---|---|---|
BAAI/bge-reranker-base | 278 MB | English | Fast CPU inference, good baseline |
BAAI/bge-reranker-large | 560 MB | English | Stronger English quality |
BAAI/bge-reranker-v2-m3 | 568 MB | 100+ langs | Best all-round; multilingual powerhouse |
BAAI/bge-reranker-v2-gemma | 2.5 GB | 100+ langs | Highest quality; needs GPU |
Recommendation: start with bge-reranker-v2-m3. It's multilingual, has strong BEIR numbers, and works fine on CPU for moderate traffic. Upgrade to the Gemma variant only if you have GPU and need maximum quality.
Benchmarks
| Model | BEIR NDCG@10 (avg) | MS MARCO MRR@10 |
|---|---|---|
| bge-reranker-base | ~56.8 | ~39.0 |
| bge-reranker-large | ~58.4 | ~40.7 |
| bge-reranker-v2-m3 | ~60.1 | ~41.1 |
| bge-reranker-v2-gemma | ~61.8 | ~42.0 |
Scores are approximate averages across the 18 BEIR datasets. Results vary by dataset — always evaluate on your own domain.
Quick start
Install
pip install sentence-transformers
Rerank a list of passages
from sentence_transformers import CrossEncoder
model = CrossEncoder("BAAI/bge-reranker-v2-m3", max_length=512)
query = "How do I add reranking to my RAG pipeline?"
passages = [
"Rerankers score each query-passage pair with a cross-encoder.",
"BM25 is a classical keyword-based retrieval method.",
"London is the capital of the United Kingdom.",
"Two-stage retrieval: retrieve 50 candidates, rerank to top 5.",
]
scores = model.predict([(query, p) for p in passages])
ranked = sorted(zip(scores, passages), reverse=True)
for score, text in ranked:
print(f"{score:.4f} {text[:80]}")
In a RAG pipeline
from sentence_transformers import CrossEncoder
reranker = CrossEncoder("BAAI/bge-reranker-v2-m3")
def rag_answer(query: str, vector_db, llm) -> str:
# Stage 1: retrieve wide
candidates = vector_db.search(query, top_k=50)
# Stage 2: rerank tight
scores = reranker.predict([(query, c) for c in candidates])
top5 = [c for _, c in sorted(zip(scores, candidates), reverse=True)[:5]]
# Stage 3: generate
return llm.complete(f"Context:\n" + "\n\n".join(top5) + f"\n\nQ: {query}")
Pros and cons
Pros
- Completely free — no API key, no per-call cost
- Strong BEIR numbers competitive with commercial APIs
- Easy integration via sentence-transformers / HuggingFace
- Multilingual (v2-m3 covers 100+ languages)
- Full control: quantise, fine-tune, distil
- Active development from BAAI
Cons
- You host, you operate — infra overhead
- CPU inference is slow beyond ~50 candidates
- No SLA, no managed scaling
- Largest variants need a GPU to be practical
See a cross-encoder score live
Our demo uses a quantised open-weight cross-encoder — same family, running entirely in your browser.
Open the demo →