Ingesting Unstructured Data

Unstructured data poses a challenge for traditional data processing techniques. Using Pathway LLM tooling, you can efficiently handle this type of data.

There are two main approaches for ingesting unstructured data:

  1. Using a dedicated library: you can use libraries such as Unstructured to parse the unstructured data (PDFs, PPTX, etc.) ingested by Pathway. You can learn more about how to do this in our Contextful Parsing RAG pipeline example.
  2. Using LLM directly: using LLMs, you can directly query unstructured data such as PDFs. For example, you can structure data on the fly and insert them into PostgreSQL: read our dedicated article to learn how to do it.
LLMunstructuredindexingdata sources