pw.xpacks.llm.rerankers

class CrossEncoderReranker(model_name, *, cache_strategy=None, **init_kwargs)

Pointwise Cross encoder reranker module.

Uses the CrossEncoder from the sentence_transformers library. For reference, check out Cross encoders documentation

Parameters
- model_name (str) – Embedding model to be used.
- cache_strategy (CacheStrategy | None) – Defines the caching mechanism. To enable caching, a valid CacheStrategy should be provided. See Cache strategy for more information. Defaults to None.

Suggested model: cross-encoder/ms-marco-TinyBERT-L-2-v2

Example:

import pathway as pw  
import pandas as pd  
from pathway.xpacks.llm import rerankers  
reranker = rerankers.CrossEncoderReranker(model_name="cross-encoder/ms-marco-TinyBERT-L-2-v2")  
docs = [{"text": "Something"}, {"text": "Something else"}, {"text": "Pathway"}]  
df = pd.DataFrame({"docs": docs, "prompt": "query text"})  
table = pw.debug.table_from_pandas(df)  
table += table.select(
    reranker_scores=reranker(pw.this.docs["text"], pw.this.prompt)
)  
table

call(doc, query, **kwargs)

sourceEvaluates the doc against the query.

Parameters
- doc (pw.ColumnExpression[str]) – Document or document chunk to be scored.
- query (pw.ColumnExpression[str]) – User query or prompt that will be used to evaluate relevance of the doc.
- **kwargs – override for defaults set in the constructor.

class EncoderReranker(model_name, *, cache_strategy=None, **init_kwargs)

[source]

Pointwise encoder reranker module.

Uses the encoders from the sentence_transformers library. For reference, check out Pretrained models documentation

Parameters
- model_name (str) – Embedding model to be used.
- cache_strategy (CacheStrategy | None) – Defines the caching mechanism. To enable caching, a valid CacheStrategy should be provided. See Cache strategy for more information. Defaults to None.

Suggested model: BAAI/bge-large-zh-v1.5

Example:

import pathway as pw  
import pandas as pd  
from pathway.xpacks.llm import rerankers  
reranker = rerankers.EncoderReranker(model_name="BAAI/bge-large-zh-v1.5")  
docs = [{"text": "Something"}, {"text": "Something else"}, {"text": "Pathway"}]  
df = pd.DataFrame({"docs": docs, "prompt": "query text"})  
table = pw.debug.table_from_pandas(df)  
table += table.select(
    reranker_scores=reranker(pw.this.docs["text"], pw.this.prompt)
)  
table

call(doc, query, **kwargs)

sourceEvaluates the doc against the query.

Parameters
- doc (pw.ColumnExpression[str]) – Document or document chunk to be scored.
- query (pw.ColumnExpression[str]) – User query or prompt that will be used to evaluate relevance of the doc.
- **kwargs – override for defaults set in the constructor.

class LLMReranker(llm, *, prompt_template=prompts.prompt_rerank, response_parser=prompts.parse_score_json)

[source]

Pointwise LLM reranking module.

Asks LLM to evaluate a given doc against a query between 1 and 5.

Parameters
- llm (BaseChat) – Chat instance to be called during reranking.
- prompt_template (Union[str, Callable[[str, str], str], UDF]) – str or Callable[[str, str], str] or pw.UDF. Template to be used for generating prompt for the LLM. Defaults to prompts.prompt_rerank. The prompt template should accept two arguments, doc and query. The prompt should ask the LLM to return jsonl with an attribute ‘score’.
- response_parser (Union[UDF, Callable[[str], float]]) – pw.UDF or Callable[[str], float]. Function to parse the response from the LLM. Must take a string as input and return a float. Defaults to prompts.parse_score_json.

Example:

import pathway as pw
import pandas as pd
from pathway.xpacks.llm import rerankers, llms
chat = llms.OpenAIChat(model="gpt-4o-mini")
reranker = rerankers.LLMReranker(chat)
docs = [{"text": "Something"}, {"text": "Something else"}, {"text": "Pathway"}]
df = pd.DataFrame({"docs": docs, "prompt": "query text"})
table = pw.debug.table_from_pandas(df)
table += table.select(
    reranker_scores=reranker(pw.this.docs["text"], pw.this.prompt)
)
table

call(doc, query)

sourceEvaluates the doc against the query.

Parameters
- doc (pw.ColumnExpression[str]) – Document or document chunk to be scored.
- query (pw.ColumnExpression[str]) – User query or prompt that will be used to evaluate relevance of the doc.
Returns
pw.ColumnExpression[float] – A column with the scores for each document.

rerank_topk_filter(docs, scores, k=5)

sourceApply top-k filtering to docs using the relevance scores.

Parameters
- docs (list[dict[str, str | dict]]) – A column with lists of documents or chunks to rank. Each row in this column is filtered separately.
- scores (list[float]) – A column with lists of re-ranking scores for chunks.
- k (int) – The number of documents to keep after filtering.

import pathway as pw
from pathway.xpacks.llm import rerankers
import pandas as pd
retrieved_docs = [
    {"text": "Something"},
    {"text": "Something else"},
    {"text": "Pathway"},
]
df = pd.DataFrame({"docs": retrieved_docs, "reranker_scores": [1.0, 3.0, 2.0]})
table = pw.debug.table_from_pandas(df)
docs_table = table.reduce(
    doc_list=pw.reducers.tuple(pw.this.docs),
    score_list=pw.reducers.tuple(pw.this.reranker_scores),
)
docs_table = docs_table.select(
    docs_scores_tuple=rerankers.rerank_topk_filter(
        pw.this.doc_list, pw.this.score_list, 2
    )
)
docs_table = docs_table.select(
    doc_list=pw.this.docs_scores_tuple[0],
    score_list=pw.this.docs_scores_tuple[1],
)
pw.debug.compute_and_print(docs_table, include_id=False)