Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessI Audited 30+ Small Businesses on Their AI Visibility. Here's What Most Are Getting Wrong.Dev.to AIHow to Actually Monitor Your LLM Costs (Without a Spreadsheet)Dev.to AIОдин промпт приносит мне $500 в неделю на фрилансеDev.to AINetflix AI Team Just Open-Sourced VOID: an AI Model That Erases Objects From Videos — Physics and AllMarkTechPostUnderstanding Data Modeling in Power BI: Joins, Relationships, and Schemas Explained.DEV CommunityHow to Supercharge Your AI Coding Workflow with Oh My CodexDev.to AIThe 11 steps that run every time you press Enter in Claude CodeDev.to AIBig Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.Dev.to AIOptimizing Claude Code token usage: lessons learnedDEV CommunityAgents Bedrock AgentCore en mode VPC : attention aux coûts de NAT Gateway !DEV CommunityIntroduction to Python ProgrammingDev.to AIWhen a Conversation with AI Became ContinuityMedium AIBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessI Audited 30+ Small Businesses on Their AI Visibility. Here's What Most Are Getting Wrong.Dev.to AIHow to Actually Monitor Your LLM Costs (Without a Spreadsheet)Dev.to AIОдин промпт приносит мне $500 в неделю на фрилансеDev.to AINetflix AI Team Just Open-Sourced VOID: an AI Model That Erases Objects From Videos — Physics and AllMarkTechPostUnderstanding Data Modeling in Power BI: Joins, Relationships, and Schemas Explained.DEV CommunityHow to Supercharge Your AI Coding Workflow with Oh My CodexDev.to AIThe 11 steps that run every time you press Enter in Claude CodeDev.to AIBig Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.Dev.to AIOptimizing Claude Code token usage: lessons learnedDEV CommunityAgents Bedrock AgentCore en mode VPC : attention aux coûts de NAT Gateway !DEV CommunityIntroduction to Python ProgrammingDev.to AIWhen a Conversation with AI Became ContinuityMedium AI
AI NEWS HUBbyEIGENVECTOREigenvector

Build a smart financial assistant with LlamaParse and Gemini 3.1

Google Developers BlogMarch 31, 20261 min read0 views
Source Quiz

This blog post introduces a workflow for extracting high-quality data from complex, unstructured documents by combining LlamaParse with Gemini 3.1 models. It demonstrates an event-driven architecture that uses Gemini 3.1 Pro for agentic parsing of dense financial tables and Gemini 3.1 Flash for cost-effective summarization. By following the provided tutorial, developers can build a personal finance assistant capable of transforming messy brokerage statements into structured, human-readable insights.

MARCH 23, 2026

Extracting text from unstructured documents is a classic developer headache. For decades, traditional Optical Character Recognition (OCR) systems have struggled with complex layouts, often turning multi-column PDFs, embedded images, and nested tables into an unreadable mess of plain text.

Today, the multimodal capabilities of large language models (LLMs) finally make reliable document understanding possible.

LlamaParse bridges the gap between traditional OCR and vision-language agentic parsing. It delivers state-of-the-art text extraction across PDFs, presentations, and images.

In this post, you will learn how to use Gemini to power LlamaParse, extract high-quality text and tables from unstructured documents, and build an intelligent personal finance assistant. As a reminder, Gemini models may make mistakes and should not be relied upon for professional advice.

Why LlamaParse?

In many cases, LLMs can already perform this task effectively, however, when working with large document collections or highly variable formats, consistency and reliability can become more challenging.

Dedicated tools like LlamaParse complement LLM capabilities by introducing preprocessing steps and customizable parsing instructions, which help structure complex elements such as large tables or dense text. In general parsing benchmarks, this approach has shown around a 13–15% improvement compared to processing raw documents directly.

The use case: parsing brokerage statements

Brokerage statements represent the ultimate document parsing challenge. They contain dense financial jargon, complex nested tables, and dynamic layouts.

To help users understand their financial situation, you need a workflow that not only parses the file, but explicitly extracts the tables and explains the data through an LLM.

Because of these advanced reasoning and multimodal requirements, Gemini 3.1 Pro is the perfect fit as the underlying model. It balances a massive context window with native spatial layout comprehension.

The workflow operates in four stages:

  • Ingest: You submit a PDF to the LlamaParse engine.
  • Route: The engine parses the document and emits a ParsingDoneEvent.
  • Extract: This event triggers two parallel tasks — text extraction and table extraction — that run concurrently to minimize latency.
  • Synthesize: Once both extractions complete, Gemini generates a human-readable summary.

This two-model architecture is a deliberate design choice: Gemini 3.1 Pro handles the hard layout-comprehension during parsing, while Gemini 3 Flash handles the final summarization — optimizing for both accuracy and cost.

You can find the complete code for this tutorial in the LlamaParse x Gemini demo GitHub repository.

Setting up the environment

First, install the necessary Python packages for LlamaCloud, LlamaIndex workflows, and the Google GenAI SDK.

# with pip pip install llama-cloud-services llama-index-workflows pandas google-genai

with uv

uv add llama-cloud-services llama-index-workflows pandas google-genai`

Shell

Copied

Next, export your API keys as environment variables. Get a Gemini API key from AI Studio, and a LlamaCloud API key from the console. Security Note: Never hardcode your API keys in your application source code.

export LLAMA_CLOUD_API_KEY="your_llama_cloud_key" export GEMINI_API_KEY="your_google_api_key"

Shell

Copied

Step 1: Create and use the parser

The first step in your workflow is parsing. You create a LlamaParse client backed by Gemini 3.1 Pro and define it in resources.py so you can inject it into your workflow as a resource:

def get_llama_parse() -> LlamaParse:  return LlamaParse(  api_key=os.getenv("LLAMA_CLOUD_API_KEY"),  parse_mode="parse_page_with_agent",  model="gemini-3.1-pro",  result_type=ResultType.MD,  )

Python

Copied

The parse_page_with_agent mode applies a layer of agentic iteration guided by Gemini to correct and format OCR results based on visual context.

In workflow.py, define the events, state, and the parsing step:

class BrokerageStatementWorkflow(Workflow):  @step  async def parse_file(  self,  ev: FileEvent,  ctx: Context[WorkflowState],  parser: Annotated[LlamaParse, Resource(get_llama_parse)]  ) -> ParsingDoneEvent | OutputEvent:  result = cast(ParsingJobResult, (await parser.aparse(file_path=ev.input_file)))  async with ctx.store.edit_state() as state:  state.parsing_job_result = result  return ParsingDoneEvent()

Python

Copied

Notice that you do not process parsing results immediately. Instead, you store them in the global WorkflowState so they are available for the extraction steps that follow.

Step 2: Extract the text and tables

To provide the LLM with the context required to explain the financial statement, you need to extract the full markdown text and the tabular data. Add the extraction steps to your BrokerageStatementWorkflow class (see the full implementation in workflow.py):

@step async def extract_text(self, ev: ParsingDoneEvent, ctx: Context[WorkflowState]) -> TextExtractionDoneEvent:

Extraction logic omitted for brevity. See repo.

@step async def extract_tables(self, ev: ParsingDoneEvent, ctx: Context[WorkflowState], ...) -> TablesExtractionDoneEvent:

Extraction logic omitted for brevity. See repo.`

Python

Copied

Because both steps listen for the same ParsingDoneEvent, LlamaIndex Workflows automatically executes them in parallel. This means your text and table extractions run concurrently — cutting overall pipeline latency and making the architecture naturally scalable as you add more extraction tasks.

Step 3: Generate the summary

With the data extracted, you can prompt Gemini 3.1 Pro to generate a summary in accessible, non-technical language.

Configure the LLM client and prompt template in resources.py. Here, you use Gemini 3 Flash for the final summarization, as it offers low latency and cost efficiency for text aggregation tasks.

The final synthesis step uses ctx.collect_events to wait for both extractions to complete before calling the Gemini API.

@step async def ask_llm(  self,  ev: TablesExtractionDoneEvent | TextExtractionDoneEvent,  ctx: Context[WorkflowState],  llm: Annotated[GenAIClient, Resource(get_llm)],  template: Annotated[Template, Resource(get_prompt_template)] ) -> OutputEvent:  if ctx.collect_events(ev, [TablesExtractionDoneEvent, TextExtractionDoneEvent]) is None:  return None

Full prompt and LLM call available in repo.`

Python

Copied

Running the workflow

To tie it all together, the main.py entry point creates and runs the workflow:

wf = BrokerageStatementWorkflow(timeout=600) result = await wf.run(start_event=FileEvent(input_file=input_file))

Python

Copied

To test the workflow, download a sample statement from the LlamaIndex datasets:

curl -L https://raw.githubusercontent.com/run-llama/llama-datasets/main/llama_agents/bank_statements/brokerage_statement.pdf > brokerage_statement.pdf

Shell

Copied

Run the workflow:

# Using pip python3 main.py brokerage_statement.pdf

Using uv

uv run run-workflow brokerage_statement.pdf`

Shell

Copied

You now have a fully functional personal finance assistant running in your terminal, capable of analyzing complex financial PDFs.

Next steps

AI pipelines are only as good as the data you feed them. By combining Gemini 3.1 Pro's multimodal reasoning with LlamaParse's agentic ingestion, you ensure your applications have the full, structured context they need — not just flattened text.

When you base your architecture on event-driven statefulness, like the parallel extractions demonstrated here, you build systems that are fast, scalable, and resilient. Double-check outputs before relying on them.

Ready to implement this in production? Explore LlamaParse and the Gemini API documentation to experiment with multimodal generation, and dive into the full code in the GitHub repository.

Previous

Next

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Build a sma…geminillamamodelassistantinsightagenticGoogle Deve…

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 349 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Models