Welcome to Pathway Developer Documentation!
Pathway is a Python data processing framework for analytics and AI pipelines over data streams. It's the ideal solution for real-time processing use cases like streaming ETL or RAG pipelines for unstructured data.
If you are looking for the AI Pipelines, you can find the associated docs here.
Key Features:
- Easy-to-use Python API: Pathway is fully compatible with Python. Use your favorite Python tools and ML libraries.
- Scalable Rust engine: your Python code is run by a powerful Rust engine with multithreading and multiprocessing. No JVM and no GIL!
- Stateful operations: use stateful and temporal operations such as groupby and windows.
- Incremental computations: using Differential Dataflow, Pathway takes care of out-of-order data points for you, in real time.
- Batch and streaming alike: use the same pipeline on static data and live data streams.
- In-memory data processing: real-time updates, reduced latency, and higher throughput.
- Easy to deploy with Docker or Kubernetes. Pathway comes with an orchestrator and is fully compatible with OpenTelemetry.
- Exactly once consistency: obtain the same results in both batch and streaming.
- Persistence and backfilling: save the state of the computation to quickly resume after a failure or a pipeline update.
- LLM tooling: online ML, RAG pipelines, vector indexes... With Pathway, your ML pipeline works on fresh data.
- Connect to any data source: Pathway comes with 350+ connectors, including SharePoint. Or implement your own.
What's next?
- Installation
- Pathway Overview
- Examples
- Core concepts
- Why Pathway
- Streaming and Static Modes
- Batch Processing
- Deployment
- LLM tooling
GitHub repository
Pathway sources are available on GitHub. Don't hesitate to clone the repo and contribute!
License key
Some features of Pathway such as monitoring or advanced connectors (e.g., SharePoint) require a free license key. To obtain a free license key, you need to register here.