100% in your browser

Live reranking demo

Paste a query and a few candidate passages. A real cross-encoder scores every pair and reorders them — running entirely on your device. The first run downloads the model (then it’s cached); after that, scoring is near-instant.

Privacy & cost: everything here runs locally with transformers.js on ONNX Runtime Web. Your query and passages never leave the browser, there’s no API key, and there’s no per-call cost — which is exactly why this page can be free and abuse-proof.

How this demo works

The model is a cross-encoder: instead of embedding the query and each passage separately, it feeds the pair (query, passage) through the network together and outputs a single relevance score. Because the two texts attend to each other directly, the score is far more precise than cosine similarity between independent embeddings — at the cost of one model call per candidate.

  1. Your query is paired with every candidate passage.
  2. Each pair is tokenised and run through the cross-encoder.
  3. The output logit is squashed to a 0–1 relevance score.
  4. Passages are sorted by score; you see how the order changes versus the input.

That’s the same operation you’d run as the second stage of a RAG pipeline — only here it happens in a browser tab instead of behind an API.

Tips: include a couple of clearly off-topic lines (the sample has “London” and “bananas”) to watch them sink to the bottom. Try rephrasing the query to see scores shift. First load is slower because of the model download; subsequent runs reuse the cached weights.

Read: what is a reranker? →