Live
Black Hat USAAI BusinessBlack Hat AsiaAI Business$1,700 liquid-cooled phone can run GTA V at up to 100 FPS, Red Dead 2 at 50+ FPS via emulation — Redmagic 11 Pro packs 24 GB of RAM and pulls more than 40W at peak loadtomshardware.comGoogle's Head Of Learning Says AI Can't Solve Education's Real Problem - ForbesGNews AI GoogleApple approves drivers that let AMD and Nvidia eGPUs run on Mac — software designed for AI, though, and not built for gamingtomshardware.comShow HN: Vektor – local-first associative memory for AI agentsHacker News AI TopWorkers are feeling AI anxiety — and that they might be training their replacementsBusiness InsiderYour neighbor just got a home security system, but should you be worried? It s inherently a little creepy says surveillance expertFortune TechWe Asked A.I. to Build Us a Video Game. The Result Was Strange. - slate.comGoogle News: AISteam geeft mogelijk indicatie van framerate games op je specifieke hardwareTweakers.netStudy: The AI Body GapHacker News AI TopI’m Worried About the Helpless AI Disruptors of the Future - GizmodoGoogle News: AII m Worried About the Helpless AI Disruptors of the FutureGizmodoDid Strong Earnings and AI Cooling Momentum Just Shift Modine Manufacturing's (MOD) Investment Narrative? - simplywall.stGNews AI manufacturingBlack Hat USAAI BusinessBlack Hat AsiaAI Business$1,700 liquid-cooled phone can run GTA V at up to 100 FPS, Red Dead 2 at 50+ FPS via emulation — Redmagic 11 Pro packs 24 GB of RAM and pulls more than 40W at peak loadtomshardware.comGoogle's Head Of Learning Says AI Can't Solve Education's Real Problem - ForbesGNews AI GoogleApple approves drivers that let AMD and Nvidia eGPUs run on Mac — software designed for AI, though, and not built for gamingtomshardware.comShow HN: Vektor – local-first associative memory for AI agentsHacker News AI TopWorkers are feeling AI anxiety — and that they might be training their replacementsBusiness InsiderYour neighbor just got a home security system, but should you be worried? It s inherently a little creepy says surveillance expertFortune TechWe Asked A.I. to Build Us a Video Game. The Result Was Strange. - slate.comGoogle News: AISteam geeft mogelijk indicatie van framerate games op je specifieke hardwareTweakers.netStudy: The AI Body GapHacker News AI TopI’m Worried About the Helpless AI Disruptors of the Future - GizmodoGoogle News: AII m Worried About the Helpless AI Disruptors of the FutureGizmodoDid Strong Earnings and AI Cooling Momentum Just Shift Modine Manufacturing's (MOD) Investment Narrative? - simplywall.stGNews AI manufacturing
AI NEWS HUBbyEIGENVECTOREigenvector

Voice-to-Schema: Turning "Track My Invoices" Into a Real Table

Dev.to AIby JakubApril 5, 20266 min read0 views
Source Quiz

We rebuilt our NLP pipeline three times before it actually worked. Here's what went wrong each time and what we learned about the gap between what people say and what they mean. The Problem Nobody Talks About When we started building VoiceTables , we had a simple hypothesis: let people describe what they need, and generate a structured table from that description. User says "I need to track my invoices," system creates an invoices table with sensible columns. Easy, right? Turns out spoken language and structured data are almost completely different things. The first version took about two weeks to build and maybe 20 minutes to realize it was broken. Attempt 1: Naive Prompt Engineering The first pipeline was embarrassingly simple. Take the transcript, send it to an LLM with a system prompt

We rebuilt our NLP pipeline three times before it actually worked. Here's what went wrong each time and what we learned about the gap between what people say and what they mean.

The Problem Nobody Talks About

When we started building VoiceTables, we had a simple hypothesis: let people describe what they need, and generate a structured table from that description. User says "I need to track my invoices," system creates an invoices table with sensible columns. Easy, right?

Turns out spoken language and structured data are almost completely different things. The first version took about two weeks to build and maybe 20 minutes to realize it was broken.

Attempt 1: Naive Prompt Engineering

The first pipeline was embarrassingly simple. Take the transcript, send it to an LLM with a system prompt like "extract a table schema from this description," parse the JSON response.

It worked perfectly for clean inputs. "Create a table with columns: client name, invoice amount, due date, status" produced exactly what you'd expect.

Nobody talks like that.

Real inputs looked like this:

  • "uh, I need something for... like tracking stuff, you know, for my clients"

  • "make me a table, invoice things, the usual"

  • "so I've got these freelance gigs and I keep losing track of who paid me"

The third one is actually the most useful input. It tells you what the user needs (payment tracking), who they're working with (freelance clients), and what the pain point is (losing track of payments). But our v1 pipeline couldn't extract any of that. It would either hallucinate random columns or return a generic two-column table that helped nobody.

Attempt 2: Two-Stage Extraction

For the second version, we split the pipeline into two steps:

  • Intent extraction: figure out what domain the user is working in (invoicing, project management, inventory, etc.) and what they actually want to track

  • Schema generation: given the intent, generate an appropriate table structure

This was better. The intent layer caught things like "freelance gigs" mapping to freelancer invoicing, which gave the schema generator much better context.

But we hit a new problem: ambiguity in spoken language vs. typed input.

When someone types "client name," they mean a text column called "Client Name." When someone says "client name," they might mean:

  • The name of their client (text column)

  • A reference to an existing clients table (foreign key)

  • "the client" as filler while they think about what to say next

Spoken language has pauses, restarts, filler words, self-corrections. "I need a table for... no wait... like, a list of my clients and their... the projects, and how much each one... you know, the budget for each project."

That sentence contains at least three potential columns (client, project, budget) and a relationship (clients have projects). Our v2 would sometimes generate six columns because it treated "no wait" and "you know" as potential data points.

Attempt 3: What Actually Works

The third rewrite introduced something we should have done from the start: a confidence-scored extraction with a clarification loop.

The pipeline now works like this:

  • Transcript cleanup: strip filler words, normalize speech patterns, handle self-corrections ("no wait" means "ignore what I just said")

  • Entity extraction with confidence: each potential column gets a confidence score. "Budget" from the sentence above gets 0.9. "Projects" gets 0.85. "The" gets filtered out entirely.

  • Schema proposal: generate the table structure, but only include columns above a confidence threshold

  • Gap detection: identify what's probably missing. If someone mentions invoices but no date column, that's a gap worth asking about.

The key insight was that we don't need to get it perfect on the first pass. We just need to get it good enough that the user can see what we understood and correct us quickly. "I see you want to track invoices for freelance clients. I've set up columns for Client Name, Project, Amount, and Status. Want me to add anything else?"

That conversational correction loop is way more natural than trying to parse everything perfectly from a single voice input.

Failure Modes We Still Deal With

It's not all solved. Some recurring edge cases:

Language mixing. We have users who switch between languages mid-sentence. Our pipeline handles Czech and English separately, but code-switching mid-sentence still trips it up sometimes.

Implicit schemas. "Make it like a CRM" assumes shared knowledge about what a CRM table looks like. We built a library of common schema templates (CRM, invoice tracker, project board, inventory) that the intent layer can match against. It covers maybe 70% of cases.

Overspecification. Some users describe every single column in detail, including data types and validation rules, all in one breath. The pipeline gets confused because it's optimized for the messy, underspecified case. We're still tuning the balance.

Numbers

After the third rewrite:

  • First-pass schema accuracy went from ~40% (v1) to ~78% (v3) measured by "did the user accept the generated schema without modifications"

  • Average time from voice input to usable table dropped from asking 3+ clarification questions to usually 0-1

  • The cleanup step alone (stripping fillers, handling corrections) improved extraction accuracy by about 15 percentage points

What I'd Do Differently

If I were starting from scratch, I'd skip the "parse everything from one input" approach entirely. Start with the clarification loop from day one. People are surprisingly patient with "let me make sure I understood you" if the follow-up question is smart.

Also, collect real voice inputs as early as possible. We spent two weeks optimizing for typed test inputs that looked nothing like actual speech. The gap between "create a table with columns name, email, phone" and "uh yeah so I need like a contacts thing" is massive, and you won't close it without real data.

I'm building VoiceTables as part of the Inithouse portfolio, where we ship AI-powered tools across different verticals. Some of our other projects include Be Recommended (check if AI chatbots recommend your brand), Watching Agents (AI prediction platform), and Audit Vibecoding (automated audits for AI-generated code). If you're building voice-first interfaces or have war stories about NLP pipelines, I'd love to hear about it in the comments.

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

versionplatformprediction

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Voice-to-Sc…versionplatformpredictioninsightinterfaceagentDev.to AI

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 131 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!