pw.xpacks.llm.rerankers
class CrossEncoderReranker(model_name, *, cache_strategy=None, **init_kwargs)
[source]Pointwise Cross encoder reranker module.
Uses the CrossEncoder from the sentence_transformers library. For reference, check out Cross encoders documentation
- Parameters
- model_name (
str
) – Embedding model to be used. - cache_strategy (
CacheStrategy
|None
) – Defines the caching mechanism. To enable caching, a valid CacheStrategy should be provided. See Cache strategy for more information. Defaults to None.
- model_name (
Suggested model: cross-encoder/ms-marco-TinyBERT-L-2-v2
Example:
import pathway as pw
import pandas as pd
from pathway.xpacks.llm import rerankers
reranker = rerankers.CrossEncoderReranker(model_name="cross-encoder/ms-marco-TinyBERT-L-2-v2")
docs = [{"text": "Something"}, {"text": "Something else"}, {"text": "Pathway"}]
df = pd.DataFrame({"docs": docs, "prompt": "query text"})
table = pw.debug.table_from_pandas(df)
table += table.select(
reranker_scores=reranker(pw.this.docs["text"], pw.this.prompt)
)
table
__call__(doc, query, **kwargs)
sourceEvaluates the doc against the query.
- Parameters
- doc (
pw.ColumnExpression[str]
) – Document or document chunk to be scored. - query (
pw.ColumnExpression[str]
) – User query or prompt that will be used to evaluate relevance of the doc. - **kwargs – override for defaults set in the constructor.
- doc (
class EncoderReranker(model_name, *, cache_strategy=None, **init_kwargs)
[source]Pointwise encoder reranker module.
Uses the encoders from the sentence_transformers library. For reference, check out Pretrained models documentation
- Parameters
- model_name (
str
) – Embedding model to be used. - cache_strategy (
CacheStrategy
|None
) – Defines the caching mechanism. To enable caching, a valid CacheStrategy should be provided. See Cache strategy for more information. Defaults to None.
- model_name (
Suggested model: BAAI/bge-large-zh-v1.5
Example:
import pathway as pw
import pandas as pd
from pathway.xpacks.llm import rerankers
reranker = rerankers.EncoderReranker(model_name="BAAI/bge-large-zh-v1.5")
docs = [{"text": "Something"}, {"text": "Something else"}, {"text": "Pathway"}]
df = pd.DataFrame({"docs": docs, "prompt": "query text"})
table = pw.debug.table_from_pandas(df)
table += table.select(
reranker_scores=reranker(pw.this.docs["text"], pw.this.prompt)
)
table
__call__(doc, query, **kwargs)
sourceEvaluates the doc against the query.
- Parameters
- doc (
pw.ColumnExpression[str]
) – Document or document chunk to be scored. - query (
pw.ColumnExpression[str]
) – User query or prompt that will be used to evaluate relevance of the doc. - **kwargs – override for defaults set in the constructor.
- doc (
class LLMReranker(llm, *, prompt_template=prompts.prompt_rerank, response_parser=prompts.parse_score_json)
[source]Pointwise LLM reranking module.
Asks LLM to evaluate a given doc against a query between 1 and 5.
- Parameters
- llm (
BaseChat
) – Chat instance to be called during reranking. - prompt_template (
Union
[str
,Callable
[[str
,str
],str
],UDF
]) – str or Callable[[str, str], str] or pw.UDF. Template to be used for generating prompt for the LLM. Defaults to prompts.prompt_rerank. The prompt template should accept two arguments, doc and query. The prompt should ask the LLM to return jsonl with an attribute ‘score’. - response_parser (
Union
[UDF
,Callable
[[str
],float
]]) – pw.UDF or Callable[[str], float]. Function to parse the response from the LLM. Must take a string as input and return a float. Defaults to prompts.parse_score_json.
- llm (
Example:
import pathway as pw
import pandas as pd
from pathway.xpacks.llm import rerankers, llms
chat = llms.OpenAIChat(model="gpt-4o-mini")
reranker = rerankers.LLMReranker(chat)
docs = [{"text": "Something"}, {"text": "Something else"}, {"text": "Pathway"}]
df = pd.DataFrame({"docs": docs, "prompt": "query text"})
table = pw.debug.table_from_pandas(df)
table += table.select(
reranker_scores=reranker(pw.this.docs["text"], pw.this.prompt)
)
table
__call__(doc, query)
sourceEvaluates the doc against the query.
- Parameters
- doc (
pw.ColumnExpression[str]
) – Document or document chunk to be scored. - query (
pw.ColumnExpression[str]
) – User query or prompt that will be used to evaluate relevance of the doc.
- doc (
- Returns
pw.ColumnExpression[float] – A column with the scores for each document.
rerank_topk_filter(docs, scores, k=5)
sourceApply top-k filtering to docs using the relevance scores.
- Parameters
- docs (
list
[dict
[str
,str
|dict
]]) – A column with lists of documents or chunks to rank. Each row in this column is filtered separately. - scores (
list
[float
]) – A column with lists of re-ranking scores for chunks. - k (
int
) – The number of documents to keep after filtering.
- docs (
import pathway as pw
from pathway.xpacks.llm import rerankers
import pandas as pd
retrieved_docs = [
{"text": "Something"},
{"text": "Something else"},
{"text": "Pathway"},
]
df = pd.DataFrame({"docs": retrieved_docs, "reranker_scores": [1.0, 3.0, 2.0]})
table = pw.debug.table_from_pandas(df)
docs_table = table.reduce(
doc_list=pw.reducers.tuple(pw.this.docs),
score_list=pw.reducers.tuple(pw.this.reranker_scores),
)
docs_table = docs_table.select(
docs_scores_tuple=rerankers.rerank_topk_filter(
pw.this.doc_list, pw.this.score_list, 2
)
)
docs_table = docs_table.select(
doc_list=pw.this.docs_scores_tuple[0],
score_list=pw.this.docs_scores_tuple[1],
)
pw.debug.compute_and_print(docs_table, include_id=False)