The $10 Billion Trust Data Market That AI Companies Can't See
<p>AI companies are spending hundreds of millions on content and listings. None of it tells them whether a business is actually good.</p> <h2> The Spending Spree </h2> <p>The numbers are staggering. In the past 18 months, AI companies have signed over <strong>$1 billion in content licensing deals</strong>. OpenAI alone has agreements with News Corp ($250M+), the Associated Press, Hearst, Condé Nast, and at least a dozen other publishers. Meta signed multi-year deals with CNN, Fox News, People, and USA Today.</p> <p>Reddit's data licensing hit <strong>$130 million per year</strong> from Google and OpenAI combined. Yelp's "other revenue" category grew 17% year-over-year, accelerating to 30-33% in Q4. Their 2026 revenue target is $1.475 billion.</p> <p>DataLane raised <strong>$27 million</str
AI companies are spending hundreds of millions on content and listings. None of it tells them whether a business is actually good.
The Spending Spree
The numbers are staggering. In the past 18 months, AI companies have signed over $1 billion in content licensing deals. OpenAI alone has agreements with News Corp ($250M+), the Associated Press, Hearst, Condé Nast, and at least a dozen other publishers. Meta signed multi-year deals with CNN, Fox News, People, and USA Today.
Reddit's data licensing hit $130 million per year from Google and OpenAI combined. Yelp's "other revenue" category grew 17% year-over-year, accelerating to 30-33% in Q4. Their 2026 revenue target is $1.475 billion.
DataLane raised $27 million in December 2025 to build an "identity graph of 20 million local businesses." Their product verifies that businesses exist — correct names, addresses, phone numbers, hours. They don't verify whether those businesses are any good.
All of this spending falls into two categories: content (what someone wrote) and listings (that entities exist). Neither category answers the question AI users actually ask: is this business worth my time and money?
The Trust Data Market Already Exists
Companies that sell trust-adjacent data are thriving. The market validates the demand.
Verisk generates $3 billion in annual revenue selling risk analytics to insurance companies. Their data comes from a cooperative pool of 1,850+ insurers contributing claims data. But it's actuarial — statistical risk models, not behavioral signals about individual businesses.
Dun & Bradstreet generates $2.4 billion selling the DUNS Number system and credit data on 600 million business records. Their Paydex score is "FICO for businesses." But it's based on self-reported trade credit data.
FICO generates $2 billion from credit scores. The data comes from three credit bureaus whose information is notoriously error-prone.
Trustpilot just posted a 320% increase in operating profit. AI search is the reason. Trustpilot is the 5th most-cited domain globally on ChatGPT. AI click-throughs to Trustpilot surged 1,490% year-over-year. The company's revenue hit $261 million with 20% growth.
A 1,490% surge in AI platforms pulling review data tells you exactly one thing: AI desperately wants trust signals, and reviews are the only ones that exist in an accessible format.
Add these up. The companies selling trust-adjacent data generate over $10 billion in annual revenue. The broader Business Information Market was valued at $191.6 billion in 2025, projected to reach $306.6 billion by 2033.
The market for trust data is enormous and proven. But every dollar of it sells one of two things: opinions (what people said) or static records (what was filed). Nobody sells what happened next.
The Carfax Gap
The clearest way to see the missing product is through Carfax.
Carfax generates an estimated $230 million per year selling verified vehicle history reports. The company built this by aggregating data from 151,000 sources — DMVs, insurance companies, repair shops, manufacturers, auction houses — into 35 billion records indexed by VIN. No opinions. No reviews. Just: here is what happened to this specific car, verified by institutional records.
Now ask: who does this for businesses?
Not "4.2 stars on Google." Rather: Did customers come back? Did the business pay its suppliers on time? Did revenue grow or shrink? Did it pass its last health inspection? How many years has it survived?
The answer is nobody. The business equivalent of a Carfax report doesn't exist.
The Data Exists. The Product Doesn't.
In Norway, the Brønnøysund Register Centre publishes full financial statements for every registered company — revenue, profit, equity, employee count, founding date — via a free, machine-readable API. The Food Safety Authority (Mattilsynet) publishes inspection results for every restaurant. BankID provides proof of personhood for 4.6 million people with zero fraud on the NFC path.
This is outcome data. Verified. Institutional. Public. And nobody has packaged it as trust infrastructure for AI.
The Norwegian case is the most accessible example, but the pattern is global. The UK's Companies House publishes financial data. France's Infogreffe does the same. Every country with food safety regulations publishes inspection results. Payment processors — Stripe, Visa, Mastercard — sit on comprehensive behavioral data: who paid whom, how often, and whether they came back.
Three Reasons the Gap Persists
If the demand is proven and the data exists, why hasn't someone built this? Three structural reasons:
Content is easier to license. Publisher deals have familiar mechanics — contracts, IP law, payment schedules. OpenAI can wire $250 million to News Corp and get a clean data feed. Assembling outcome data requires aggregating from hundreds of fragmented sources: business registries, inspection agencies, payment processors, regulatory filings.
Privacy is genuinely hard. The most valuable outcome data — repeat visits, return rates, transaction frequency — involves individual behavior. Zero-knowledge proofs are now production-ready: zkTLS has 3 million+ verifications with zero fraud. Semaphore V4 generates proofs in 3 milliseconds. The technical bottleneck cleared in 2025-2026.
The cold start problem feels fatal. If you need behavioral data from millions of businesses, where do you start? The answer: you don't start with behavioral data. You start with public outcome data — registry filings, inspection results, financial statements — which has no cold start problem at all. Layer behavioral signals on top as the network grows.
What AI Companies Actually Need
Here's the product that doesn't exist: a trust data API where an AI model can query a business identifier and get back structured, verified outcome signals.
Not sentiment. Not reviews. Not what someone wrote on Reddit. Verified answers to concrete questions:
-
How long has this business operated? (Registry data)
-
Is it financially healthy? (Filed financial statements)
-
Did it pass its last regulatory inspection? (Government records)
-
Do customers return? (Aggregated, anonymized behavioral data)
-
How does it compare to peers in its category and location? (Computed from all of the above)
Every AI company that's licensing Yelp reviews at $25 per thousand calls would pay multiples of that for verified outcome data. Because outcome data does what reviews can't: it gives AI a ground truth to anchor its recommendations to.
Yelp's AI API proves the business model works. A business that sells AI companies something better than reviews — at a price that reflects the quality difference — enters a market that's already buying.
The Timing
Three things make this moment different from any previous attempt to build trust data infrastructure:
AI is the buyer that didn't exist before. Previous trust data products (D&B, FICO) sold to institutions making individual credit decisions. The new buyer is fundamentally different: AI platforms making millions of recommendations per day. ChatGPT has 883 million monthly active users. AI Overviews appear in 55% of Google searches. The volume of decisions that need trust data has increased by orders of magnitude — and the willingness to pay for it is proven by the billion-dollar content licensing spree.
Privacy technology matured. Contributing behavioral data without revealing individual behavior requires zero-knowledge proofs at consumer scale. That wasn't possible two years ago. It is now.
Regulation is creating the identity layer. eIDAS 2.0 mandates digital identity wallets for 450 million Europeans by end of 2026. BankID already covers essentially all Nordic adults. World ID has verified 18 million unique humans. The sybil-resistance infrastructure — ensuring each data contribution comes from a real, unique person — is being built by others, at their expense.
The problem is proven ($10B+ market). The data exists (registries, inspections, transactions). The privacy technology works (zkTLS, Semaphore). The identity layer is being built (eIDAS, BankID, World ID). The buyer is desperate and spending (AI content licensing).
What's missing is the product.
The Race Nobody Has Entered
Here's what's strange about this moment. AI companies are spending furiously on data. Trust data companies are posting record profits. Privacy technology just crossed the production threshold. And nobody — not Yelp, not Trustpilot, not D&B, not any AI company — has built a verified outcome data product for AI.
Yelp is closest, but they sell opinions through an API, not outcomes. Trustpilot is the biggest beneficiary, but passively — AI models cite them because there's nothing better, not because Trustpilot built an AI product. DataLane verifies listings, not quality. D&B sells self-reported credit data.
The race to build the trust data layer for AI hasn't started. Everyone is selling yesterday's data type — opinions, listings, content — to tomorrow's buyer. The first company to sell verified outcome data enters the market with the product everyone needs and nobody has.
We know the market works. We know what the product looks like. We know the technology is ready. The only question left is who builds it first.
This is the fourth essay in a series on behavioral commitment as trust infrastructure.
DEV Community
https://dev.to/piiiico/the-10-billion-trust-data-market-that-ai-companies-cant-see-2pj6Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modelproductplatform![[D] Physicist-turned-ML-engineer looking to get into ML research. What's worth working on and where can I contribute most?](https://d2xsxph8kpxj0f.cloudfront.net/310419663032563854/konzwo8nGf8Z4uZsMefwMr/default-img-quantum-N2hdoEfCm2gAozJVRfL5wL.webp)
[D] Physicist-turned-ML-engineer looking to get into ML research. What's worth working on and where can I contribute most?
After years of focus on building products, I'm carving out time to do independent research again and trying to find the right direction. I have stayed reasonably up-to-date regarding major developments of the past years (reading books, papers, etc) ... but I definitely don't have a full understanding of today's research landscape. Could really use the help of you experts :-) A bit more about myself: PhD in string theory/theoretical physics (Oxford), then quant finance, then built and sold an ML startup to a large company where I now manage the engineering team. Skills/knowledge I bring which don't come as standard with Physics: Differential Geometry Topology (numerical solution of) Partial Differential Equations (numerical solution of) Stochastic Differential Equations Quantum Field Theory

Gemma 4 is efficient with thinking tokens, but it will also happily reason for 10+ minutes if you prompt it to do so.
Tested both 26b and 31b in AI Studio. The task I asked of it was to crack a cypher. The top closed source models can crack this cypher at max thinking parameters, and Kimi 2.5 Thinking and Deepseek 3.2 are the only open source models to crack the cypher without tool use. (Of course, with the closed models you can't rule out 'secret' tool use on the backend.) When I first asked these models to crack the cypher, they thought for a short amount of time and then both hallucinated false 'translations' of the cypher. I added this to my prompt: Spare no effort to solve this, the stakes are high. Increase your thinking length to maximum in order to solve it. Double check and verify your results to rule out hallucination of an incorrect response. I did not expect dramatic results (we all laugh at p

One of the best sensible reasons that I can think of to have an llm downloaded on my cell phone would be emergency advice.
It seems like every conversation about derestricted models everyone treat you like a pervert. The fact is you can be sensible and be a pervert 😂. submitted by /u/RedParaglider [link] [comments]
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Countries

Can AI Predict the Next Stock Market Crash? Unpacking the Hype and Reality for Global Investors
Can AI Predict the Next Stock Market Crash? Unpacking the Hype and Reality for Global Investors While AI can identify complex patterns and anomalies that might precede market downturns, it cannot definitively predict the exact timing or magnitude of a stock market crash with 100% accuracy. The reality for global investors NOW is that AI serves as a powerful tool for risk management and early warning systems, analyzing vast datasets including inflation trends, interest rate movements, and geopolitical risks, but it remains susceptible to 'black swan' events and the inherent unpredictability of human behavior and unforeseen global shocks. Understanding AI's Role in Market Prediction AI's role in market prediction stems from its ability to process and analyze immense volumes of data far beyon




Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!