Models gemini llama model assistant insight agentic

Build a smart financial assistant with LlamaParse and Gemini 3.1

Google Developers BlogMarch 31, 20261 min read0 views

This blog post introduces a workflow for extracting high-quality data from complex, unstructured documents by combining LlamaParse with Gemini 3.1 models. It demonstrates an event-driven architecture that uses Gemini 3.1 Pro for agentic parsing of dense financial tables and Gemini 3.1 Flash for cost-effective summarization. By following the provided tutorial, developers can build a personal finance assistant capable of transforming messy brokerage statements into structured, human-readable insights.

MARCH 23, 2026

Extracting text from unstructured documents is a classic developer headache. For decades, traditional Optical Character Recognition (OCR) systems have struggled with complex layouts, often turning multi-column PDFs, embedded images, and nested tables into an unreadable mess of plain text.

Today, the multimodal capabilities of large language models (LLMs) finally make reliable document understanding possible.

LlamaParse bridges the gap between traditional OCR and vision-language agentic parsing. It delivers state-of-the-art text extraction across PDFs, presentations, and images.

In this post, you will learn how to use Gemini to power LlamaParse, extract high-quality text and tables from unstructured documents, and build an intelligent personal finance assistant. As a reminder, Gemini models may make mistakes and should not be relied upon for professional advice.

Why LlamaParse?

In many cases, LLMs can already perform this task effectively, however, when working with large document collections or highly variable formats, consistency and reliability can become more challenging.

Dedicated tools like LlamaParse complement LLM capabilities by introducing preprocessing steps and customizable parsing instructions, which help structure complex elements such as large tables or dense text. In general parsing benchmarks, this approach has shown around a 13–15% improvement compared to processing raw documents directly.

The use case: parsing brokerage statements

Brokerage statements represent the ultimate document parsing challenge. They contain dense financial jargon, complex nested tables, and dynamic layouts.

To help users understand their financial situation, you need a workflow that not only parses the file, but explicitly extracts the tables and explains the data through an LLM.

Because of these advanced reasoning and multimodal requirements, Gemini 3.1 Pro is the perfect fit as the underlying model. It balances a massive context window with native spatial layout comprehension.

The workflow operates in four stages:

Ingest: You submit a PDF to the LlamaParse engine.
Route: The engine parses the document and emits a ParsingDoneEvent.
Extract: This event triggers two parallel tasks — text extraction and table extraction — that run concurrently to minimize latency.
Synthesize: Once both extractions complete, Gemini generates a human-readable summary.

This two-model architecture is a deliberate design choice: Gemini 3.1 Pro handles the hard layout-comprehension during parsing, while Gemini 3 Flash handles the final summarization — optimizing for both accuracy and cost.

You can find the complete code for this tutorial in the LlamaParse x Gemini demo GitHub repository.

Setting up the environment

First, install the necessary Python packages for LlamaCloud, LlamaIndex workflows, and the Google GenAI SDK.

# with pip pip install llama-cloud-services llama-index-workflows pandas google-genai

# with pip pip install llama-cloud-services llama-index-workflows pandas google-genai

with uv

uv add llama-cloud-services llama-index-workflows pandas google-genai`

Shell

Copied

Next, export your API keys as environment variables. Get a Gemini API key from AI Studio, and a LlamaCloud API key from the console. Security Note: Never hardcode your API keys in your application source code.

export LLAMA_CLOUD_API_KEY="your_llama_cloud_key" export GEMINI_API_KEY="your_google_api_key"

export LLAMA_CLOUD_API_KEY="your_llama_cloud_key" export GEMINI_API_KEY="your_google_api_key"

Shell

Copied

Step 1: Create and use the parser

The first step in your workflow is parsing. You create a LlamaParse client backed by Gemini 3.1 Pro and define it in resources.py so you can inject it into your workflow as a resource:

def get_llama_parse() -> LlamaParse:  return LlamaParse(  api_key=os.getenv("LLAMA_CLOUD_API_KEY"),  parse_mode="parse_page_with_agent",  model="gemini-3.1-pro",  result_type=ResultType.MD,  )

def get_llama_parse() -> LlamaParse:  return LlamaParse(  api_key=os.getenv("LLAMA_CLOUD_API_KEY"),  parse_mode="parse_page_with_agent",  model="gemini-3.1-pro",  result_type=ResultType.MD,  )

Python

Copied

The parse_page_with_agent mode applies a layer of agentic iteration guided by Gemini to correct and format OCR results based on visual context.

In workflow.py, define the events, state, and the parsing step:

class BrokerageStatementWorkflow(Workflow):  @step  async def parse_file(  self,  ev: FileEvent,  ctx: Context[WorkflowState],  parser: Annotated[LlamaParse, Resource(get_llama_parse)]  ) -> ParsingDoneEvent | OutputEvent:  result = cast(ParsingJobResult, (await parser.aparse(file_path=ev.input_file)))  async with ctx.store.edit_state() as state:  state.parsing_job_result = result  return ParsingDoneEvent()

class BrokerageStatementWorkflow(Workflow):  @step  async def parse_file(  self,  ev: FileEvent,  ctx: Context[WorkflowState],  parser: Annotated[LlamaParse, Resource(get_llama_parse)]  ) -> ParsingDoneEvent | OutputEvent:  result = cast(ParsingJobResult, (await parser.aparse(file_path=ev.input_file)))  async with ctx.store.edit_state() as state:  state.parsing_job_result = result  return ParsingDoneEvent()

Python

Copied

Notice that you do not process parsing results immediately. Instead, you store them in the global WorkflowState so they are available for the extraction steps that follow.

Step 2: Extract the text and tables

To provide the LLM with the context required to explain the financial statement, you need to extract the full markdown text and the tabular data. Add the extraction steps to your BrokerageStatementWorkflow class (see the full implementation in workflow.py):

@step async def extract_text(self, ev: ParsingDoneEvent, ctx: Context[WorkflowState]) -> TextExtractionDoneEvent:

@step async def extract_text(self, ev: ParsingDoneEvent, ctx: Context[WorkflowState]) -> TextExtractionDoneEvent:

Extraction logic omitted for brevity. See repo.

@step async def extract_tables(self, ev: ParsingDoneEvent, ctx: Context[WorkflowState], ...) -> TablesExtractionDoneEvent:

Extraction logic omitted for brevity. See repo.`

Python

Copied

Because both steps listen for the same ParsingDoneEvent, LlamaIndex Workflows automatically executes them in parallel. This means your text and table extractions run concurrently — cutting overall pipeline latency and making the architecture naturally scalable as you add more extraction tasks.

Step 3: Generate the summary

With the data extracted, you can prompt Gemini 3.1 Pro to generate a summary in accessible, non-technical language.

Configure the LLM client and prompt template in resources.py. Here, you use Gemini 3 Flash for the final summarization, as it offers low latency and cost efficiency for text aggregation tasks.

The final synthesis step uses ctx.collect_events to wait for both extractions to complete before calling the Gemini API.

@step async def ask_llm(  self,  ev: TablesExtractionDoneEvent | TextExtractionDoneEvent,  ctx: Context[WorkflowState],  llm: Annotated[GenAIClient, Resource(get_llm)],  template: Annotated[Template, Resource(get_prompt_template)] ) -> OutputEvent:  if ctx.collect_events(ev, [TablesExtractionDoneEvent, TextExtractionDoneEvent]) is None:  return None

@step async def ask_llm(  self,  ev: TablesExtractionDoneEvent | TextExtractionDoneEvent,  ctx: Context[WorkflowState],  llm: Annotated[GenAIClient, Resource(get_llm)],  template: Annotated[Template, Resource(get_prompt_template)] ) -> OutputEvent:  if ctx.collect_events(ev, [TablesExtractionDoneEvent, TextExtractionDoneEvent]) is None:  return None

Full prompt and LLM call available in repo.`

Python

Copied

Running the workflow

To tie it all together, the main.py entry point creates and runs the workflow:

wf = BrokerageStatementWorkflow(timeout=600) result = await wf.run(start_event=FileEvent(input_file=input_file))

wf = BrokerageStatementWorkflow(timeout=600) result = await wf.run(start_event=FileEvent(input_file=input_file))

Python

Copied

To test the workflow, download a sample statement from the LlamaIndex datasets:

curl -L https://raw.githubusercontent.com/run-llama/llama-datasets/main/llama_agents/bank_statements/brokerage_statement.pdf > brokerage_statement.pdf

Shell

Copied

Run the workflow:

# Using pip python3 main.py brokerage_statement.pdf

# Using pip python3 main.py brokerage_statement.pdf

Using uv

uv run run-workflow brokerage_statement.pdf`

Shell

Copied

You now have a fully functional personal finance assistant running in your terminal, capable of analyzing complex financial PDFs.

Next steps

AI pipelines are only as good as the data you feed them. By combining Gemini 3.1 Pro's multimodal reasoning with LlamaParse's agentic ingestion, you ensure your applications have the full, structured context they need — not just flattened text.

When you base your architecture on event-driven statefulness, like the parallel extractions demonstrated here, you build systems that are fast, scalable, and resilient. Double-check outputs before relying on them.

Ready to implement this in production? Explore LlamaParse and the Gemini API documentation to experiment with multimodal generation, and dive into the full code in the GitHub repository.

Original source

Google Developers Blog

https://developers.googleblog.com/build-a-smart-financial-assistant-with-llamaparse-and-gemini-31/

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

geminillamamodel

Models

Exclusive | Meta Is Delaying the Rollout of Its Flagship AI Model - WSJ

Exclusive | Meta Is Delaying the Rollout of Its Flagship AI Model WSJ

GNews AI Llama

1m11 months ago

Models

Meta’s AI Gamble Pays Off: 24% Ad Revenue Surge Validates ‘Andromeda’ and Llama 4 Integration - The Chronicle-Journal

Meta’s AI Gamble Pays Off: 24% Ad Revenue Surge Validates ‘Andromeda’ and Llama 4 Integration The Chronicle-Journal

GNews AI Llama

1mabout 2 months ago

ModelsLive

Netflix AI Team Just Open-Sourced VOID: an AI Model That Erases Objects From Videos — Physics and All

Video editing has always had a dirty secret: removing an object from footage is easy; making the scene look like it was never there is brutally hard. Take out a person holding a guitar, and you re left with a floating instrument that defies gravity. Hollywood VFX teams spend weeks fixing exactly this kind of problem. [ ] The post Netflix AI Team Just Open-Sourced VOID: an AI Model That Erases Objects From Videos — Physics and All appeared first on MarkTechPost .

MarkTechPost

1m43 minutes ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 349 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Models

Models

Exclusive | Meta Is Delaying the Rollout of Its Flagship AI Model - WSJ

Exclusive | Meta Is Delaying the Rollout of Its Flagship AI Model WSJ

GNews AI Llama

1m11 months ago

Models

Meta’s AI Gamble Pays Off: 24% Ad Revenue Surge Validates ‘Andromeda’ and Llama 4 Integration - The Chronicle-Journal

Meta’s AI Gamble Pays Off: 24% Ad Revenue Surge Validates ‘Andromeda’ and Llama 4 Integration The Chronicle-Journal

GNews AI Llama

1mabout 2 months ago

ModelsLive

Netflix AI Team Just Open-Sourced VOID: an AI Model That Erases Objects From Videos — Physics and All

MarkTechPost

1m43 minutes ago

ModelsLive

Sharing Two Open-Source Projects for Local AI & Secure LLM Access 🚀

Hey everyone! I’m finally jumping into the dev.to community. To kick things off, I wanted to share two tools I’ve been developing at the University of Jaén that tackle two common headaches in the AI space: running out of VRAM, and keeping your API chats truly private. 🦥 Quansloth: TurboQuant Local AI Server The Problem: Standard LLM inference hits a "Memory Wall" with long documents. As context grows, your GPU runs out of memory (OOM) and crashes. The Solution: Quansloth is a fully private, air-gapped AI server that brings elite KV cache compression to consumer hardware. By bridging a Gradio Python frontend with a highly optimized llama.cpp CUDA backend, it prevents GPU crashes and lets you run massive contexts on a budget. Key Features: 75% VRAM Savings: Based on Google's TurboQuant (ICL

DEV Community

2mabout 1 hour ago