Notebook-to-Docker Conversion

Jupyter notebooks and Docker are two convenient ways to run Pathway code. Jupyter notebooks are useful for exploration and interactive development while Docker is used for deployment.

In notebooks, bash commands, code, and explanations are intertwined. To successfully run a notebook with Docker, you need to extract the shell commands, such as pip install and make Docker run them separately.

⚠️ In Jupyter notebooks the exclamation mark (!) allows users to run shell commands from inside a Jupyter Notebook code cell. Those commands should be removed and added to the Dockerfile in order to be able to run the Jupyter notebook as a regular Python file.

This tutorial will show you how to easily convert your Jupyter notebook to make it work with Docker by following those steps:

  1. Create a Dockerfile: This file defines the instructions for building your Docker image.
  2. Add the dependencies to the Dockerfile: Specify any libraries your code requires.
  3. Customize the Dockerfile (Optional): Add additional shell commands for specific needs.
  4. Refactor Pathway Code: Remove unnecessary code specific to the notebook environment such as the shell commands.
  5. Run with Docker: Execute your code using the created Docker image.

1. Dockerfile

The Docker deployment will be done using a Dockerfile. This file contains the instructions used by Docker to build a container image. Pathway comes with its own Docker image. To use it, you can simply create a simple file called Dockerfile and use the pathway image with FROM:

Dockerfile
FROM pathwaycom/pathway:latest

COPY . .

CMD [ "python", "./your-script.py" ]

You can also use a regular Python image, you can learn more about it in the dedicated article.

2. Dependencies

In Jupyter notebooks, dependencies are installed using a code cell and the exclamation mark (!) to run pip install bash commands. For example, suppose you want to install langchain, langchain_community, and lanchain_openai. In a Jupyter notebook, you would create a code cell like this:

!pip install langchain
!pip install langchain_community
!pip install lanchain_openai

This cell would not work in a regular Python file. You need to remove those lines and install the dependencies via the Dockerfile.

There are two main approaches to manage dependencies in a Dockerfile:

  1. Installing the dependencies manually with pip install commands.
  2. Using a requirements.txt file.

Solution 1: Using pip install commands

If you have a small number of dependencies, you can directly list the installation commands within the Dockerfile:

Dockerfile
FROM pathwaycom/pathway:latest

RUN pip install langchain
RUN pip install langchain_community
RUN pip install lanchain_openai

COPY . .

CMD [ "python", "./your-script.py" ]

Replace langchain, langchain_community, and langchain_openai with the actual libraries your code uses.

Solution 2: Using a requirements.txt file

For a larger number of dependencies, consider creating a requirements.txt file that lists them:

requirements.txt
langchain
langchain_community
langchain_openai

Then, update your Dockerfile to install dependencies from this file:

Dockerfile
FROM pathwaycom/pathway:latest

# Copy requirements file and install dependencies
COPY requirements.txt ./
RUN pip install -r ./requirements.txt

COPY . .

CMD [ "python", "./your-script.py" ]

Choose the method that best suits your project's complexity.

3. Customize the Dockerfile

Similarly to dependencies, you may have other bash commands in your Python code that you may want to execute.

For example, suppose that your Jupyter notebook downloads data using the following cell:

!wget -nc https://your-data-add.com/data

This will not work if your Jupyter notebook is executed as a regular file. You need to remove this line and add the command to the Dockerfile:

Dockerfile
FROM pathwaycom/pathway:latest

# Copy requirements file and install dependencies
COPY requirements.txt ./
RUN pip install -r ./requirements.txt

RUN wget -nc https://your-data-add.com/data

COPY . .

CMD [ "python", "./your-script.py" ]

⚠️ Note that the command does not have the exclamation mark (!) anymore.

You should do this step for each shell command in your Jupyter notebook.

4. Refactor Pathway Code

You need to convert your .ipynb file to a regular .py Python file. You can do it directly from JupyterLab by File -> Save and Export Notebook as... -> Executable Script.

Remove all unnecessary commands

You need to remove any code specifically used for the interactive notebook environment (e.g., displaying visualizations). Don't forget to remove all the shell commands.

(Optional) Switch to streaming

To switch your example from static to streaming, you need to:

  • Remove all pw.debug references.
  • Check that all your connectors are in the streaming mode (you can set them to streaming with mode="streaming").
  • Add a pw.run().

To learn more about how to switch from batch to streaming, read our dedicated tutorial.

5. Run with Docker

Now that your Dockerfile and your Python files are ready, you can then build and run the Docker image:

docker build -t my-pathway-app .
docker run -it --rm --name my-pathway-app my-pathway-app