community

How AI Agents in Finance Are Transforming Financial Due Diligence: FA3STER

Pathway Community
·Published March 26, 2025·Updated March 26, 2025·0 min read

Introduction

Financial due diligence represents a highly resource-intensive, time-consuming and painstakingly manual process, often extending over several weeks or months. This complexity arises from the need for exhaustive, detailed analysis and reasoning across financial datasets, which often require frequent updates.

In our ambitious approach , we're harnessing Pathway's dynamic capabilities to develop a cutting-edge Agentic RAG (Retrieval-Augmented Generation) system. Our goal? To create a solution that autonomously retrieves, analyzes, and synthesizes information from diverse documents—solving real-world challenges efficiently.

Our solution, designed around the core objectives outlined earlier, directly addresses the key pain points of financial due diligence (FDD). It is tailored to serve all of the stakeholders of FDD, from investors to analysts and lawyers, by automating document analysis, accurately responding to queries, and dynamically adapting to changing document datasets. Additionally, it generates summarized, concise FDD reports and quick-look dashboards for targeted firms, offering a strong starting point in the due diligence process. It can also be used independently to perform Q&A on financial documents.

A Quick 3-Minute Overview of FA3STER – an Autonomous, Multi-Agent Real-Time RAG System

Why does it stand out ?

While there are generic RAG-based systems available for Q&A on documents, these have moderate accuracy and often suffer from high scope for error. Given the high stakes involved in FDD, these systems cannot be relied upon

  • Real-World Application: Our solution is not just an abstract RAG implementation—it is purpose-built to handle the complexities of financial due diligence (FDD).
  • Versatile Functionality: Beyond streamlining the FDD process, our solution can also act as an agentic RAG-based chatbot for financial documents, capable of answering complex queries related to financial documents.
  • Innovative Architecture: Our approach is not just about using RAG but refining it. This is validated by both theoretical intuitions and empirical results.
  • Unmatched Offering: The competitive landscape lacks a dedicated solution that combines all such aspects that parallel the comprehensive and innovative capabilities we deliver for our specific use case of enhancing the FDD process.

Solution Overview

Financial Agentic Autonomous and Accurate System Through Evolving (Dynamic) Retrieval Augmented Generation, or is designed to address the challenges of FDD. At the heart of FA3STER lies a sophisticated four-component architecture:

1. Intelligent Retrieval with Context-Aware Chunking

FA3STER enhances document parsing through an Agentic Chunker, ensuring contextually relevant information retrieval. Powered by Pathway’s real-time streaming and dynamic indexing, this phase improves accuracy while keeping data fresh.

2. Autonomous Post-Retrieval Processing

Our system goes beyond simple data retrieval by integrating specialized agents for finance, document grading, SQL querying, and data visualization. These agents intelligently analyze, cross-validate, and optimize workflows—minimizing errors and ensuring reliable insights.

3. The Vertical Autonomous Layer (The Game-Changer)

Rather than just stacking more tools, FA3STER introduces a self-improving network of interlinked agents that:

  • Collaborate, cross-check findings, and exchange knowledge
  • Iterate through multiple rounds to refine financial insights
  • Generate concise, actionable FDD reports and real-time dashboards

4. Transparent and Interactive UI

Users get a real-time agent workflow visualization, allowing complete transparency into how FA3STER processes financial documents. The interactive UI makes complex operations accessible, ensuring seamless user experience.

System Architecture

Our multi-agent RAG system ensures precision, seamless integration, and robust performance in financial analysis. It leverages Pathway’s Dynamic Indexing Pipeline, integrating Vector Store and Google Drive Connector for real-time data updates.

Key components include:

  • Agentic Chunker (GPT-4o-mini) – Enhances document fragments for context-aware retrieval.
  • Unstructured’s Parser & OpenAI’s text-embedding-3-small – Processes unstructured data and enables hybrid indexing with semantic search and metadata filtering.
  • Cohere’s Re-ranker – Optimizes complex query decomposition and ranking.
  • LangGraph & SELF-RAG Architecture – Drives post-retrieval decision-making with specialized financial agents.
  • Tavily Search & Reasoning Agents – Enrich data with external sources and generate insightful visualizations.

Code Repository – Complete Setup & Usage Guide

lalit-03/fa3ster-iitp-interiit-techGitHub
heroicons:chevron-right-16-solid

Pre-Retrieval and Retrieval

  1. Pathway’s Real-Time Streaming Framework
    Pathway's Google Drive connector dynamically streamlines data ingestion into the RAG system, supporting both real-time updates and one-time imports.
  2. Pre-Retrieval Workflow and Data Processing
    Since the documents are mainly in pdf format, we need to have them converted to embeddings before using them for retrieval. For that we have to parse the document, chunk and embed it and finally store it in the vector store.
  • Unstructured’s Parser: Unstructured.io's parsing tools excel at extracting structured data from various financial document formats, including tables, lists, and key-value pairs.
  • Agentic Chunker (GPT-4o-mini)
    • Goes beyond standard segmentation by enriching document chunks with contextual metadata.
    • Enhances retrieval accuracy by incorporating semantic relationships across different document sections.
    • Addresses limitations of traditional retrieval methods by improving nuanced understanding and relevance ranking.
  • OpenAI’s text-embedding-3-small: OpenAI's text-embedding-3-small model generates vector representations of text, capturing semantic meaning.
  • Pathway’s Vector Store
    • Optimized for storing and retrieving embeddings, enabling efficient similarity search.
    • Supports dynamic updates, ensuring the database remains up to date as new financial data is ingested.
    • Implements Hybrid Indexing, combining:
      • Semantic relevance (vector embeddings).
      • Metadata-based filtering (e.g., company name, stock ticker, document date).
    • Provides scalability for large financial datasets.
  1. Query Decomposer: Breaks down complex financial queries into smaller subqueries, allowing retrieval of relevant documents from multiple perspectives.
  2. Cohere Re-ranker: Assigns a relevance score to each document by understanding the query's intent, ensuring that the most relevant knowledge base entries are prioritized in responses.

Post-Retrieval

FA3STER’s post-retrieval phase transforms raw data into highly relevant financial insights using an Agentic-RAG framework powered by LangGraph. This hybrid system integrates specialized agents, ensuring accuracy, efficiency, and adaptability in financial due diligence.

1. Intelligent Document Retrieval

The process begins by retrieving the most contextually relevant financial documents from the Pathway Vector Store. FA3STER enhances transparency by preserving metadata, ensuring users can track the source and reliability of retrieved data.

2. Smart Document Grading & Routing

Retrieved documents undergo automated evaluation based on:

  • Relevance & Quality – Ensuring only high-value data is processed.
  • Content-Type Detection – Routing subqueries based on document type:
    • Transform Query Agent – Refines unclear queries for better results.
    • SQL Agent – Extracts structured financial data from databases.
    • Finance Agent – Analyzes market trends, stock performance, and key financial metrics.
    • Generate Node – If documents fully answer the query, they proceed to final output generation.

3. Real-Time Web Search for External Insights

When internal data is insufficient, FA3STER utilizes Tavily Search to fetch real-time financial insights from external sources. If results lack precision, the Finance Agent cross-verifies retrieved data and refines the response.

4. Financial Data Interpretation & Visualization

  • Reasoning Agent – Converts financial data into statistical graphs and trend visualizations, making insights actionable and easier to interpret.

5. Fact-Checked Response Generation

  • Fact Verification Agent – Ensures coherence and eliminates inconsistencies before final output.
  • Aggregator Agent – Merges multiple subquery responses into a single, well-structured report.

Vertical Autonomous Layer:

Revolutionizing Due Diligence with Vertical Scaling

Instead of expanding FA3STER horizontally, we introduced a Vertical Autonomous Layer—a self-optimizing intelligence layer that enhances speed, accuracy, and efficiency in financial due diligence. This layer generates a concise FDD report in 10–12 minutes, offering stakeholders a quick yet highly detailed financial assessment. With a processing cost of only ₹15–₹20 ($0.20–$0.25) per report, FA3STER ensures affordable and scalable automation.

Key Components of the Autonomous Layer

  • Key Metrics Agent – Analyzes revenue, profit margins, and growth rates to assess financial health.
  • Business Agent – Evaluates market conditions, competitive positioning, and risk factors.
  • Executive Agent – Assesses governance, compliance, and long-term strategic alignment.

How FA3STER’s Autonomous Layer Works

FA3STER’s agents operate in two iterative modes:

  1. Q&A Mode – Agents generate domain-specific financial queries, retrieving insights using the RAG system.
  2. Discussion Mode – Agents collaborate, refine findings, and generate new investigative questions to enhance financial accuracy.

The cycle repeats until the majority of agents agree on the accuracy and completeness of the analysis. The final Quick Overview Panel displays summarized insights, making it easy for stakeholders to interact with key financial data instantly.

Results and Metrics

For evaluation, we use 2 datasets:

  • FinQABench:
    • Based on Apple’s 2022 10K SEC filing, containing 100 test cases with financial queries and expected responses.
    • Used to assess accuracy, hallucination prevention, and response quality in financial AI systems.

We compared the performance of the following workflows on this dataset:

  • Naive RAG without Agentic Contextual Chunking
  • Naive RAG with Agentic Contextual Chunking
WorkflowContext PrecisionContext RecallResponse RelevancyFaithfulnessFactual Correctness
Naive RAG with Agentic Chunking0.9040.8490.9280.9010.518
Naive RAG without Agentic Chunking0.8010.7820.8830.8280.449

It is clear from the results that the performance is enhanced by Agentic Chunking.

  • SEC 10-Q dataset:
    • Includes four AAPL 10-Q filings and 39 complex financial queries requiring multi-step reasoning.
    • Designed to stress-test retrieval-augmented generation (RAG) models for multi-document financial analysis.

Here, we compared the following workflows:

  • OpenParser with Self-RAG [3]
  • OpenParser with Naive-RAG
  • Our Post-Retrieval Agentic Workflow with Contextual Chunking
  • Unstructured Parser with Naive-RAG
  • Unstructured Parser with Self-RAG

It can be observed that Our Post-Retrieval Agentic Workflow with Contextual Chunking outperforms other methods by a significant margin. This shows the efficiency and enhanced performance of our Post-Retrieval Agentic Workflow on datasets which require reasoning over multiple documents in order to answer queries. In particular, improved Factual Correctness and Faithfulness are much needed in Financial Due Diligence, as it indicates that hallucination is minimized and information is preserved.

The following metrics from RAGAS [2] were used for evaluation:

  1. Context Precision: Measures how many retrieved chunks are relevant to the query. Higher is better.
  2. Context Recall: Assesses the proportion of relevant documents successfully retrieved
  3. Response Relevancy: Evaluates how well the generated answer matches the query intent.
  4. Faithfulness: Ensures responses remain factually consistent with retrieved data.
  5. Factual Correctness: Compares answers to ground truth financial data for accuracy.
WorkflowContext PrecisionContext RecallResponse RelevancyFaithfulnessFactual Correctness
OpenParser with Self-RAG0.5240.2720.3640.6590.280
OpenParser with Naive-RAG0.5210.2530.3740.6230.273
Unstructured Parser with Self-RAG0.4990.2590.2690.5960.269
Unstructured Parser with Naive-RAG0.5240.3130.3190.5030.276
Our Workflow0.6900.3960.7010.7880.384

Resilience to Error Handling:

Our Financial Agentic RAG system incorporates robust error management, ensuring uninterrupted functionality. This enhances resilience by enabling quick recovery from failures, rapid error identification, and timely debugging and resolution.

Tool-Specific Fallbacks: Key tools have designated backups, such as web search stepping in for API failures or alternative tools in the Finance Agent. For example: When the relevant data is not retrieved from the primary data source, Finance Agent falls back to tools like Tavily and if that too fails, it falls back to open source web searches like duck duck go, further fallback details have been mentioned in the appendix.

Error Handling: Nested try and except blocks to manage failures effectively. Fallback Mechanisms, extensive logging is implemented to ensure traceability. When a fallback is triggered, the callback function raises a warning to facilitate tracking and debugging.

Avoiding Exponential Back-Off: Excluded exponential back-off to maintain query speed, as most tool failures were observed to be binary.

Seamless User Experience with Dual-Mode Interface

FA3STER provides an intuitive, transparent, and data-driven interface designed for financial professionals, analysts, and investors. Users can switch between two core modes for flexibility and depth in financial analysis.

1. Chat Mode: Interactive Financial Query Resolution

  • Users can ask real-time financial questions, and FA3STER generates accurate, data-backed responses.
  • Built on Pathway’s infrastructure, the system provides a clear, step-by-step breakdown of how each response is generated by live streaming the agentic flow.
  • Offers full transparency, allowing users to track how financial insights are derived.

2. Report Generation Mode: Automated Due Diligence Reports

  • Users enter a company’s name, and FA3STER’s Agentic RAG framework autonomously retrieves and processes financial data.
  • Generates concise yet detailed FDD reports, summarizing key financial insights such as:
    • Revenue shares
    • Global market penetration
    • Key financial performance indicators
  • The Next.js-powered UI & websockets ensures real-time visibility into agent activity.
  • The final report is saved locally, while an interactive dashboard visualizes insights for strategic decision-making.

Responsible AI: Security, Compliance & Transparency

Our AI architecture incorporates robust guardrails to ensure secure, ethical, and compliant interactions. These safeguards block irrelevant queries, including defamation, privacy violations, hate speech, and intellectual property concerns, upholding privacy and ethical standards.

Llamaguard’s Safety guardrail [4]:

  • Powered by Llama-3.1 8B, Llamaguard classifies queries as safe or unsafe.
  • Detects 14 predefined risk categories (e.g., privacy violations, hate speech, defamation).
  • Ensures secure and compliance-driven user interactions.

PII Guardrail [5]:

  • Initially explored Guardrail.ai with Presidio Analyzer & Anonymizer for PII detection.
  • Found the approach unnecessary and resource-intensive, leading to its removal for workflow optimization.

Transparency through Socket Communication:

  • FA3STER’s backend and frontend are connected via socket communication to track query execution in real time.
  • Every query’s processing path is logged and visually represented, giving users a clear, step-by-step breakdown of data flow.
  • Enhances trust, explainability, and system reliability.

Conclusion: Transforming Financial Due Diligence with FA3STER

FA3STER redefines financial due diligence by combining Pathway’s dynamic capabilities with an agentic RAG architecture. Through intelligent document retrieval, contextual chunking, and autonomous multi-agent processing, it delivers highly accurate, efficient, and scalable financial insights.

Rigorous testing confirms FA3STER’s superior performance over traditional RAG systems, minimizing errors while enhancing speed, transparency, and reliability. By leveraging Pathway’s real-time streaming and indexing framework, FA3STER ensures seamless financial analysis, adapting to evolving datasets with unparalleled precision.

As AI-driven financial intelligence continues to evolve, FA3STER stands at the forefront—setting new standards for automation, accuracy, and decision-making in the financial sector.

If you are interested in diving deeper into the topic, here are some good references to get started with Pathway:

Authors:


Pathway Community

Multiple authors

Power your RAG and ETL pipelines with Live Data

Get started for free