bge-reranker

Open weights · BAAI · free to self-host

Open weightsSelf-hostMultilingual

The bge-reranker family from BAAI (Beijing Academy of Artificial Intelligence) are open-weight cross-encoders trained specifically for passage reranking. They're the standard self-hosted option: free to download, drop into sentence-transformers, and strong enough to beat most commercial APIs on English benchmarks.

Model variants

ModelSizeLanguagesBest for
BAAI/bge-reranker-base278 MBEnglishFast CPU inference, good baseline
BAAI/bge-reranker-large560 MBEnglishStronger English quality
BAAI/bge-reranker-v2-m3568 MB100+ langsBest all-round; multilingual powerhouse
BAAI/bge-reranker-v2-gemma2.5 GB100+ langsHighest quality; needs GPU

Recommendation: start with bge-reranker-v2-m3. It's multilingual, has strong BEIR numbers, and works fine on CPU for moderate traffic. Upgrade to the Gemma variant only if you have GPU and need maximum quality.

Benchmarks

ModelBEIR NDCG@10 (avg)MS MARCO MRR@10
bge-reranker-base~56.8~39.0
bge-reranker-large~58.4~40.7
bge-reranker-v2-m3~60.1~41.1
bge-reranker-v2-gemma~61.8~42.0

Scores are approximate averages across the 18 BEIR datasets. Results vary by dataset — always evaluate on your own domain.

Quick start

Install

pip install sentence-transformers

Rerank a list of passages

from sentence_transformers import CrossEncoder

model = CrossEncoder("BAAI/bge-reranker-v2-m3", max_length=512)

query = "How do I add reranking to my RAG pipeline?"
passages = [
    "Rerankers score each query-passage pair with a cross-encoder.",
    "BM25 is a classical keyword-based retrieval method.",
    "London is the capital of the United Kingdom.",
    "Two-stage retrieval: retrieve 50 candidates, rerank to top 5.",
]

scores = model.predict([(query, p) for p in passages])
ranked = sorted(zip(scores, passages), reverse=True)

for score, text in ranked:
    print(f"{score:.4f}  {text[:80]}")

In a RAG pipeline

from sentence_transformers import CrossEncoder

reranker = CrossEncoder("BAAI/bge-reranker-v2-m3")

def rag_answer(query: str, vector_db, llm) -> str:
    # Stage 1: retrieve wide
    candidates = vector_db.search(query, top_k=50)
    # Stage 2: rerank tight
    scores = reranker.predict([(query, c) for c in candidates])
    top5 = [c for _, c in sorted(zip(scores, candidates), reverse=True)[:5]]
    # Stage 3: generate
    return llm.complete(f"Context:\n" + "\n\n".join(top5) + f"\n\nQ: {query}")

Pros and cons

Pros

  • Completely free — no API key, no per-call cost
  • Strong BEIR numbers competitive with commercial APIs
  • Easy integration via sentence-transformers / HuggingFace
  • Multilingual (v2-m3 covers 100+ languages)
  • Full control: quantise, fine-tune, distil
  • Active development from BAAI

Cons

  • You host, you operate — infra overhead
  • CPU inference is slow beyond ~50 candidates
  • No SLA, no managed scaling
  • Largest variants need a GPU to be practical

See a cross-encoder score live

Our demo uses a quantised open-weight cross-encoder — same family, running entirely in your browser.

Open the demo →

Other models