Applied NLP with LLMs: Beyond Black-Box Monoliths

Explosion AI Blogby Ines MontaniOctober 9, 20241 min read0 views

In this talk, Ines shows some practical solutions for using the latest state-of-the-art models in real-world applications and distilling their knowledge into smaller and faster components.

Resources

A practical guide to human-in-the-loop distillation

https://explosion.ai/blog/human-in-the-loop-distillation

This blog post presents practical solutions for using the latest state-of-the-art models in real-world applications and distilling their knowledge into smaller and faster components that you can run and maintain in-house.

Applied NLP Thinking: How to Translate Problems into Solutions

https://explosion.ai/blog/applied-nlp-thinking

This blog post discusses some of the biggest challenges for applied NLP and translating business problems into machine learning solutions, including the distinction between utility and accuracy.

How S&P Global is making markets more transparent with NLP, spaCy and Prodigy

https://explosion.ai/blog/sp-global-commodities

A case study on S&P Global’s efficient information extraction pipelines for real-time commodities trading insights in a high-security environment using human-in-the-loop distillation.

How GitLab uses spaCy to analyze support tickets and empower their community

https://explosion.ai/blog/gitlab-support-insights

A case study on GitLab’s large-scale NLP pipelines for extracting actionable insights from support tickets and usage questions.

Using LLMs for human-in-the-loop distillation in Prodigy

https://prodi.gy/docs/large-language-models

Prodigy comes with preconfigured workflows for using LLMs to speed up and automate annotation and create datasets for distilling large generative models into more accurate, smaller, faster and fully private task-specific components.

Transcript

Ines Montani Explosion LLM
Falcon MIXTRAL GPT-4 LLM
Falcon MIXTRAL GPT-4 good contextual results LLM
Pro t ot y pe & Productio n CLOSE THE

GAP BETWEEN CLOSE THE GAP BETWEEN 📝 standardize inputs and outputs 📈 start with evaluation 🔮 assess utility, not just accuracy explosion.ai/blog/applied-nlp-thinking How to avoid the prototype plateau?

Pro t ot y pe & Productio n CLOSE THE

GAP BETWEEN CLOSE THE GAP BETWEEN 📝 standardize inputs and outputs 📈 start with evaluation 🔮 assess utility, not just accuracy explosion.ai/blog/applied-nlp-thinking 🛠 work on data iteratively How to avoid the prototype plateau?

Pro t ot y pe & Productio n CLOSE THE

GAP BETWEEN CLOSE THE GAP BETWEEN 📝 standardize inputs and outputs 📈 start with evaluation 🔮 assess utility, not just accuracy explosion.ai/blog/applied-nlp-thinking 💬 consider structure and ambiguity of natural language 🛠 work on data iteratively How to avoid the prototype plateau?

in the loop H uma n explosion.ai/blog/human-in-the-loop-distillation LLM
Case Stud y : PyData NYC 8hr 400mb 2k+ 8hr

400mb 2k+ • extracting dishes, ingredients and equipment from r/cooking Reddit posts • used LLM during annotation model size words/second data dev time spacy.fyi/pydata-nyc

Case Stud y : PyData NYC 8hr 400mb 2k+ 8hr

400mb 2k+ • extracting dishes, ingredients and equipment from r/cooking Reddit posts • used LLM during annotation • 20× inference time speedup model size words/second data dev time spacy.fyi/pydata-nyc

Case Stud y : PyData NYC 8hr 400mb 2k+ 8hr

400mb 2k+ • extracting dishes, ingredients and equipment from r/cooking Reddit posts • used LLM during annotation • 20× inference time speedup • beat few-shot LLM baseline of 0.74 with task-specific model model size words/second data dev time spacy.fyi/pydata-nyc

Case Stud y : PyData NYC 8hr 400mb 2k+ 8hr

Case Stud y : S&P Global 99% 6mb 16k+ 99%

6mb 16k+ • real-time commodities trading insights by extracting structured attributes • high-security environment model size words/second F-score explosion.ai/blog/sp-global-commodities

Case Stud y : S&P Global 99% 6mb 16k+ 99%

6mb 16k+ • real-time commodities trading insights by extracting structured attributes • high-security environment • used LLM during annotation model size words/second F-score explosion.ai/blog/sp-global-commodities

Case Stud y : S&P Global 99% 6mb 16k+ 99%

6mb 16k+ • real-time commodities trading insights by extracting structured attributes • high-security environment • used LLM during annotation • 10× data development speedup with humans and model in the loop model size words/second F-score explosion.ai/blog/sp-global-commodities

Case Stud y : S&P Global 99% 6mb 16k+ 99%

6mb 16k+ • real-time commodities trading insights by extracting structured attributes • high-security environment • used LLM during annotation • 10× data development speedup with humans and model in the loop • 8 market pipelines in production model size words/second F-score explosion.ai/blog/sp-global-commodities

Case Stud y : S&P Global 99% 6mb 16k+ 99%

6mb 16k+ • real-time commodities trading insights by extracting structured attributes • high-security environment • used LLM during annotation • 10× data development speedup with humans and model in the loop • 8 market pipelines in production model size words/second F-score explosion.ai/blog/sp-global-commodities

break down larger problems
break down larger problems make problem easier
break down larger problems make problem easier reassess dependencies
Case Stud y : GitLab 1 year 6× 1 year

6× • extract actionable insights from support tickets and usage questions • high-security environment speedup of support tickets explosion.ai/blog/gitlab-support-insights

Case Stud y : GitLab 1 year 6× 1 year

6× • extract actionable insights from support tickets and usage questions • high-security environment • easy to adapt to new scenarios and business questions speedup of support tickets explosion.ai/blog/gitlab-support-insights

Case Stud y : GitLab 1 year 6× 1 year

6× • extract actionable insights from support tickets and usage questions • high-security environment • easy to adapt to new scenarios and business questions • separated general-purpose features from product-specific logic speedup of support tickets explosion.ai/blog/gitlab-support-insights

Case Stud y : GitLab 1 year 6× 1 year

Reason and refactor. The key to success lies in your

data and may surprise you! LLM Stay ambitious. Don’t compromise on best practices, e iciency and privacy. Summar y APPLIED NLP & GEN AI APPLIED NLP & GEN AI Iterate. The right tooling and mindset gets you past the “prototype plateau”.

Original source

Explosion AI Blog

https://speakerdeck.com/inesmontani/applied-nlp-with-llms-beyond-black-box-monoliths

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modelapplicationcomponent

ProductsLive

Variables: Data Storage and Information Organization

<p>Level: Beginner | Stack: Frontend and Backend | Type: Dictionary</p> <p>A <strong>variable</strong> is a space in the computer's memory reserved to store data that can be used and modified during the execution of a program. They solve the problem of value memorization, allowing the developer to use user-friendly names to manipulate complex or dynamic information.</p> <h3> Variable Types and Data Types </h3> <p>In development, every language has its own way of handling data. While the core concepts are similar (numbers, text, booleans), the <strong>nomenclatures</strong> and <strong>typing</strong> vary significantly.</p> <h4> JavaScript (and TypeScript) </h4> <p>JavaScript is known for its dynamic typing, but TypeScript adds rigor to these types.</p> <ul> <li> <strong>Number</strong>: R

DEV Community

5m39 minutes ago

ModelsLive

The Evolution of Natural Language Processing: A Journey from 1960 to 2020

<h1> The Evolution of Natural Language Processing: A Journey from 1960 to 2020 </h1> <p><em>How we taught machines to understand human language — from simple pattern matching to transformer-powered AI</em></p> <h2> Introduction: The Dream of Conversational Machines </h2> <p>Imagine asking a machine a question in plain English and receiving a thoughtful, contextual response. Today, this seems ordinary — we talk to Siri, Alexa, and ChatGPT without a second thought. But six decades ago, this was pure science fiction.</p> <p>Natural Language Processing (NLP) emerged from the intersection of linguistics, artificial intelligence, and computer science, driven by a simple but profound goal: enabling computers to understand, analyze, and generate human language the way we do.</p> <p>This is the sto

DEV Community

11m23 minutes ago

ProductsLive

When LangChain Is Enough: How to Build Useful AI Apps Without Overengineering

<h1> When LangChain Is Enough: How to Build Useful AI Apps Without Overengineering </h1> <p>Most AI apps do not fail because they started too simple.</p> <p>They fail because the team introduced complexity before they had earned the need for it.</p> <p>That is the default mistake in AI engineering right now. Not underengineering. <strong>Overengineering too early.</strong></p> <p>A team ships a working prototype with prompt + tools. Then somebody decides that a “real” system needs orchestration. Then someone else proposes explicit state machines, checkpointing, multiple agents, delegation, recovery paths, approval flows, and a runtime architecture diagram that looks like an airport subway map.</p> <p>Meanwhile, the product still only needs to:</p> <ul> <li>answer a question,</li> <li>call

DEV Community

18m21 minutes ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 98 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Models

Models

How MAGA learned to love AI safety - Transformer | Substack

<a href="https://news.google.com/rss/articles/CBMiqAFBVV95cUxNQUR3SWZmM2p6NTVYaDBLeFhjWkN6aGZ4ZGt0WG5mVnFNNmxiRE9XMzEyUkg0QzJkQUJxZGxJRTd2enFSdkhQYWtCcjA4NzBCdEN3VTBTQS1tTWR0a3JEeGhTT1RCQWEtSEhzaVRZQVBRbXA1ZGRrVmJUTmU1aUNKclY5cHZOVm5USzhHSGFKS01FYzNzMG1UUmp3aVFlbl9XZGJPWXp1aDY?oc=5" target="_blank">How MAGA learned to love AI safety</a> <font color="#6f6f6f">Transformer | Substack</font>

GNews AI welfare

1m5 months ago

ModelsLive

The Evolution of Natural Language Processing: A Journey from 1960 to 2020

DEV Community

11m23 minutes ago

Models

Exclusive | The Sudden Fall of OpenAI’s Most Hyped Product Since ChatGPT - WSJ

<a href="https://news.google.com/rss/articles/CBMiogNBVV95cUxOWWxBMGhZQlVBWFVneHhFM1Q1MGFOUElWNDJvc1FiUUVKX04yTkNQWUxKTVEwcm9tRWZ2WWRiSGNtOGFiMHBOV0RCRlJ5Tl8zbmYyQTM1VFQ3TUh2WVZaeW82Vi16NE14dGVDWnhxLUR4MEZZVm56SWFBWjl2a05uZUYycWpDNkJMMFFFZEVMVGh5Y0dKek5ZaVpWMi1sRmxKY2NacGpiQ0IwT1NRQlJ6aUNLdzE5cjlfNEdISzJQMU5mVlp3TkRtMUxxVThaSC0xV25pN0hjdTlMN0M5QmFaVXFqYm9JR29SVnZvcjRPYTdaNjdQT0V0aXp2XzFIakxuQXhLRG01UWJkQmtHZ0VFQWduaUpJT3lJU28tOXlaenExZ016UFFyR0M1S054RzZhdzB0aENJQmZ6V1VjSW4xZlhCaXFTOGpkNC1XZ0VIbFpNSG15b2dGUXIweVViTUZ6WGJJU2szNkhiT2l3RFE0VFdrS1dqdWJZX1djcVhHSnpfd3h3UVJoMTBhRjdCclNHdlpkM29NVFFVdlBtQ3czSHdR?oc=5" target="_blank">Exclusive | The Sudden Fall of OpenAI’s Most Hyped Product Since ChatGPT</a> <font color="#6f6f6f">WSJ</font>

GNews AI video

1m3 days ago

ModelsFresh

Google Launches Veo 3.1 Lite, a More Cost-Effective AI Video Generator Model - CNET

<a href="https://news.google.com/rss/articles/CBMisgFBVV95cUxOSzR5bHhCcDJMUUhUd3B5Um9xajlKRzFLMEIwUFNacmFFQVlLVXY3UVF3OEFpTDJVdngzRjNYV0ZOMkstMi1KeFI4QWNvS1hleXJ5Rm5rbVBOSG9vc1lVNV9SVTZUYnBVcTNoM3NvMEFNVGVnMklrclVzbHZRLWxZWmoxQW9UQW15V0VpcGtxZGt6d2tBaGhhcTBlM2ZuWDhxMDMtNFVRejE3aU9SemdDLUZ3?oc=5" target="_blank">Google Launches Veo 3.1 Lite, a More Cost-Effective AI Video Generator Model</a> <font color="#6f6f6f">CNET</font>

GNews AI video

1mabout 9 hours ago

Applied NLP with LLMs: Beyond Black-Box Monoliths

Resources

A practical guide to human-in-the-loop distillation

Applied NLP Thinking: How to Translate Problems into Solutions

How S&P Global is making markets more transparent with NLP, spaCy and Prodigy

How GitLab uses spaCy to analyze support tickets and empower their community

Using LLMs for human-in-the-loop distillation in Prodigy

Transcript

Ines Montani Explosion LLM

Falcon MIXTRAL GPT-4 LLM

Falcon MIXTRAL GPT-4 good contextual results LLM

Pro t ot y pe & Productio n CLOSE THE

Pro t ot y pe & Productio n CLOSE THE

Pro t ot y pe & Productio n CLOSE THE

in the loop H uma n explosion.ai/blog/human-in-the-loop-distillation LLM

Case Stud y : PyData NYC 8hr 400mb 2k+ 8hr

Case Stud y : PyData NYC 8hr 400mb 2k+ 8hr

Case Stud y : PyData NYC 8hr 400mb 2k+ 8hr

Case Stud y : PyData NYC 8hr 400mb 2k+ 8hr

Case Stud y : S&P Global 99% 6mb 16k+ 99%

Case Stud y : S&P Global 99% 6mb 16k+ 99%

Case Stud y : S&P Global 99% 6mb 16k+ 99%

Case Stud y : S&P Global 99% 6mb 16k+ 99%

Case Stud y : S&P Global 99% 6mb 16k+ 99%

break down larger problems

break down larger problems make problem easier

break down larger problems make problem easier reassess dependencies

Case Stud y : GitLab 1 year 6× 1 year

Case Stud y : GitLab 1 year 6× 1 year

Case Stud y : GitLab 1 year 6× 1 year

Case Stud y : GitLab 1 year 6× 1 year

Reason and refactor. The key to success lies in your

Daily AI Digest

More about

Variables: Data Storage and Information Organization

The Evolution of Natural Language Processing: A Journey from 1960 to 2020

When LangChain Is Enough: How to Build Useful AI Apps Without Overengineering

Knowledge Map

Connected Articles — Knowledge Graph

Discussion

More in Models

How MAGA learned to love AI safety - Transformer | Substack

The Evolution of Natural Language Processing: A Journey from 1960 to 2020

Exclusive | The Sudden Fall of OpenAI’s Most Hyped Product Since ChatGPT - WSJ

Google Launches Veo 3.1 Lite, a More Cost-Effective AI Video Generator Model - CNET