pw.xpacks.llm.embedders
Pathway embedder UDFs.
class pw.xpacks.llm.embedders.GeminiEmbedder(*, capacity=None, retry_strategy=None, cache_strategy=None, model='models/embedding-001', api_key=None, **gemini_kwargs)
[source]Pathway wrapper for Google Gemini Embedding services.
The capacity
, retry_strategy
and cache_strategy
need to be specified during object
construction. All other arguments can be overridden during application.
- Parameters
- capacity (
-
) – Maximum number of concurrent operations allowed. Defaults toNone
, indicating no specific limit. - retry_strategy (
-
) – Strategy for handling retries in case of failures. Defaults toNone
, meaning no retries. - cache_strategy (
-
) – Defines the caching mechanism. To enable caching, a validCacheStrategy
should be provided. See Cache strategy for more information. Defaults to None. - model (
-
) – ID of the model to use. Check the Gemini documentation for list of available models. To specify the model in the UDF call, set it to None in the constructor. - api_key (
-
) – API key for Gemini API services. Can be provided in the constructor, in__call__
or by settingGOOGLE_API_KEY
environment variable - gemini_kwargs (
-
) – any other arguments accepted by gemini embedding service. Check the Gemini documentation for list of accepted arguments.
- capacity (
Example:
import pathway as pw
from pathway.xpacks.llm import embedders
embedder = embedders.GeminiEmbedder(model="models/text-embedding-004")
t = pw.debug.table_from_markdown('''
txt
Text
''')
t.select(ret=embedder(pw.this.txt))
import pathway as pw
from pathway.xpacks.llm import embedders
embedder = embedders.GeminiEmbedder()
t = pw.debug.table_from_markdown('''
txt | model
Text | models/embedding-001
''')
t.select(ret=embedder(pw.this.txt, model=pw.this.model))
__call__(input, *args, **kwargs)
sourceEmbeds texts in a Column.
- Parameters
input (-
) – Column with texts to embed
get_embedding_dimension(**kwargs)
sourceComputes number of embedder’s dimensions by asking the embedder to embed "."
.
- Parameters
**kwargs (-
) – parameters of the embedder, if unset defaults from the constructor will be taken.
class pw.xpacks.llm.embedders.LiteLLMEmbedder(*, capacity=None, retry_strategy=None, cache_strategy=None, model=None, **llmlite_kwargs)
[source]Pathway wrapper for litellm.embedding.
Model has to be specified either in constructor call or in each application, no default is provided. The capacity, retry_strategy and cache_strategy need to be specified during object construction. All other arguments can be overridden during application.
- Parameters
- capacity (
-
) – Maximum number of concurrent operations allowed. Defaults to None, indicating no specific limit. - retry_strategy (
-
) – Strategy for handling retries in case of failures. Defaults to None, meaning no retries. - cache_strategy (
-
) – Defines the caching mechanism. To enable caching, a valid CacheStrategy should be provided. See Cache strategy for more information. Defaults to None. - model (
-
) – The embedding model to use. - timeout (
-
) – The timeout value for the API call, default 10 mins - litellm_call_id (
-
) – The call ID for litellm logging. - litellm_logging_obj (
-
) – The litellm logging object. - logger_fn (
-
) – The logger function. - api_base (
-
) – Optional. The base URL for the API. - api_version (
-
) – Optional. The version of the API. - api_key (
-
) – Optional. The API key to use. - api_type (
-
) – Optional. The type of the API. - custom_llm_provider (
-
) – The custom llm provider.
- capacity (
Any arguments can be provided either to the constructor or in the UDF call. To specify the model in the UDF call, set it to None.
Example:
import pathway as pw
from pathway.xpacks.llm import embedders
embedder = embedders.LiteLLMEmbedder(model="text-embedding-ada-002")
t = pw.debug.table_from_markdown('''
txt
Text
''')
t.select(ret=embedder(pw.this.txt))
import pathway as pw
from pathway.xpacks.llm import embedders
embedder = embedders.LiteLLMEmbedder()
t = pw.debug.table_from_markdown('''
txt | model
Text | text-embedding-ada-002
''')
t.select(ret=embedder(pw.this.txt, model=pw.this.model))
__call__(input, *args, **kwargs)
sourceEmbeds texts in a Column.
- Parameters
input (-
) – Column with texts to embed
get_embedding_dimension(**kwargs)
sourceComputes number of embedder’s dimensions by asking the embedder to embed "."
.
- Parameters
**kwargs (-
) – parameters of the embedder, if unset defaults from the constructor will be taken.
class pw.xpacks.llm.embedders.OpenAIEmbedder(*, capacity=None, retry_strategy=None, cache_strategy=None, model='text-embedding-ada-002', **openai_kwargs)
[source]Pathway wrapper for OpenAI Embedding services.
The capacity, retry_strategy and cache_strategy need to be specified during object construction. All other arguments can be overridden during application.
- Parameters
- capacity (
-
) – Maximum number of concurrent operations allowed. Defaults to None, indicating no specific limit. - retry_strategy (
-
) – Strategy for handling retries in case of failures. Defaults to None, meaning no retries. - cache_strategy (
-
) – Defines the caching mechanism. To enable caching, a valid CacheStrategy should be provided. See Cache strategy for more information. Defaults to None. - model (
-
) – ID of the model to use. You can use the List models API to see all of your available models, or see Model overview for descriptions of them. - encoding_format (
-
) – The format to return the embeddings in. Can be either float or base64. - user (
-
) – A unique identifier representing your end-user, which can help OpenAI to monitor and detect abuse. Learn more. - extra_headers (
-
) – Send extra headers - extra_query (
-
) – Add additional query parameters to the request - extra_body (
-
) – Add additional JSON properties to the request - timeout (
-
) – Timeout for requests, in seconds
- capacity (
Any arguments can be provided either to the constructor or in the UDF call. To specify the model in the UDF call, set it to None.
Example:
import pathway as pw
from pathway.xpacks.llm import embedders
embedder = embedders.OpenAIEmbedder(model="text-embedding-ada-002")
t = pw.debug.table_from_markdown('''
txt
Text
''')
t.select(ret=embedder(pw.this.txt))
import pathway as pw
from pathway.xpacks.llm import embedders
embedder = embedders.OpenAIEmbedder()
t = pw.debug.table_from_markdown('''
txt | model
Text | text-embedding-ada-002
''')
t.select(ret=embedder(pw.this.txt, model=pw.this.model))
__call__(input, *args, **kwargs)
sourceEmbeds texts in a Column.
- Parameters
input (-
) – Column with texts to embed
get_embedding_dimension(**kwargs)
sourceComputes number of embedder’s dimensions by asking the embedder to embed "."
.
- Parameters
**kwargs (-
) – parameters of the embedder, if unset defaults from the constructor will be taken.
class pw.xpacks.llm.embedders.SentenceTransformerEmbedder(model, call_kwargs={}, device='cpu', **sentencetransformer_kwargs)
[source]Pathway wrapper for Sentence-Transformers embedder.
- Parameters
- model (
str
) – model name or path - call_kwargs (
dict
) – kwargs that will be passed to each call of encode. These can be overridden during each application. For possible arguments check the Sentence-Transformers documentation. - device (
str
) – defines which device will be used to run the Pipeline - sentencetransformer_kwargs – kwargs accepted during initialization of SentenceTransformers. For possible arguments check the Sentence-Transformers documentation
- model (
Example:
import pathway as pw
from pathway.xpacks.llm import embedders
embedder = embedders.SentenceTransformerEmbedder(model="intfloat/e5-large-v2")
t = pw.debug.table_from_markdown('''
txt
Text
''')
t.select(ret=embedder(pw.this.txt))
__call__(input, *args, **kwargs)
sourceEmbeds texts in a Column.
- Parameters
input (-
) – Column with texts to embed
get_embedding_dimension(**kwargs)
sourceComputes number of embedder’s dimensions by asking the embedder to embed "."
.
- Parameters
**kwargs (-
) – parameters of the embedder, if unset defaults from the constructor will be taken.