Scanned 500 AI agent repos for bugs, nobody thinks of infinite loops
Article URL: https://inkog.io/report Comments URL: https://news.ycombinator.com/item?id=47642960 Points: 2 # Comments: 1
Research Report
Findings from scanning 500+ open-source AI agent projects
The largest security analysis of the AI agent ecosystem. Original data from automated static analysis — not surveys or interviews.
85
%
of repos had at least one vulnerability
25
%
failed EU AI Act Article 14 (human oversight)
11,705
total findings across all repositories
Enter your work email. Instant PDF download + a follow-up with key takeaways.
500+
Repos Scanned
85%
With Findings
63%
CRITICAL/HIGH
25%
Article 14 Fail
What you'll learn
500+ repos. 11,705 findings. 10 frameworks compared. Here's what the data reveals.
Which vulnerability appears in 4 out of 5 agent repos?
The top 10 vulnerability types ranked by prevalence — and why the #1 finding isn't prompt injection.
Which framework has 3x more critical findings than average?
Head-to-head security comparison across LangChain, CrewAI, AutoGen, pydantic-ai, MCP servers, and more.
Why 25% of repos fail EU AI Act Article 14
Compliance readiness scores for every repo. Article-by-article breakdown of where the ecosystem falls short.
MCP servers: the new attack surface nobody is auditing
The first large-scale security audit of MCP server repositories. Tool poisoning, argument injection, and credential exposure.
What goes wrong in repos with 25K+ stars
Anonymized deep-dives into popular frameworks. High star counts don't mean high security — here's the proof.
The 5 fixes that eliminate 80% of findings
Actionable remediation guidance for developers, security teams, and CISOs. Mapped to OWASP Agentic Top 10 and NIST AI RMF.
Methodology
1
Discovery
40 GitHub search queries targeting AI agent frameworks (LangChain, CrewAI, AutoGen, MCP servers, and 35+ others). Top 100 results per query, sorted by stars. Deduplicated and filtered to repos with 20+ stars, no forks.
2
Scanning
Each repo shallow-cloned and scanned with Inkog v1.1.0 using the comprehensive policy (all detectors, no confidence filtering). Results parsed and stored as structured JSON.
3
Analysis
Inkog's Universal IR engine converts any agent framework to a framework-agnostic intermediate representation. Detection rules, DFG taint analysis, and compliance mapping run on this unified IR.
4
Compliance Mapping
Every finding automatically mapped to EU AI Act articles, NIST AI RMF controls, and OWASP Agentic Top 10 entries. Governance scores computed for each repository.
Based on scanning 500+ repositories across every major AI agent framework. The only report backed by automated static analysis data — not surveys or interviews.
LangChainCrewAIAutoGenpydantic-aiLangGraphMCP ServersOpenAI Agentsn8nFlowiseDSPy
Get the full report
Original data, framework comparisons, compliance analysis, and remediation guidance — straight to your inbox.
Read the blog post
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
reportagent
Cloud Observability vs Monitoring: What's the Difference and Why It Matters
Cloud Observability vs Monitoring: What's the Difference and Why It Matters Your alerting fires at 2 AM. CPU is at 94%, error rate is at 6.2%, and latency is climbing. You page the on-call engineer. They open the dashboard. They see the numbers going up. What they cannot see is why — because the service throwing errors depends on three upstream services, one of which depends on a database that is waiting on a connection pool that was quietly exhausted by a batch job that ran 11 minutes ago. Monitoring told you something was wrong. Observability would have told you what. This is not a semantic argument. Teams with mature observability resolve incidents 2.8x faster than teams that rely on monitoring alone, according to DORA research. The gap matters in production. Understanding why the gap e

Real-time emotion detection from webcam — no wearables needed
We’ve been running controlled trials with real-time facial affect analysis using nothing but a standard 720p webcam — no IR sensors, no EEG caps, no chest straps. The goal? Detect emotional valence and arousal with enough accuracy to be useful in high-stakes environments: remote proctoring, telehealth triage, UX research. Most open-source pipelines fail here because they treat emotion as a static classification problem. We treat it as a dynamic signal. Our stack uses a lightweight RetinaFace for detection, followed by a pruned EfficientNet-B0 fine-tuned on dynamic expressions from the AFEW and SEED datasets — not just static FER2013 junk. Temporal smoothing via a 1D causal CNN on top of softmax outputs reduces jitter and improves response latency under variable lighting. The real breakthro

Takedown is not a ticket, but a campaign-suppression system
Most security teams still talk about takedown as if it were one workflow: detect a phishing page, file an abuse report, wait for the host or registrar, close the ticket, move on. That model was always too simple, and it is getting weaker. The better way to think about takedown is this: takedown is the process of reducing attacker operating time across the assets, channels, and trust surfaces a campaign depends on . If your process only removes one URL but leaves the spoofed number, the cloned social profile, the fake app listing, the paid ad, or the next domain in the chain untouched, you did not really suppress the campaign. You trimmed one branch. That distinction matters because modern phishing and scam operations are not domain-only problems. APWG recorded 892,494 phishing attacks in Q
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Analyst News

Real-time emotion detection from webcam — no wearables needed
We’ve been running controlled trials with real-time facial affect analysis using nothing but a standard 720p webcam — no IR sensors, no EEG caps, no chest straps. The goal? Detect emotional valence and arousal with enough accuracy to be useful in high-stakes environments: remote proctoring, telehealth triage, UX research. Most open-source pipelines fail here because they treat emotion as a static classification problem. We treat it as a dynamic signal. Our stack uses a lightweight RetinaFace for detection, followed by a pruned EfficientNet-B0 fine-tuned on dynamic expressions from the AFEW and SEED datasets — not just static FER2013 junk. Temporal smoothing via a 1D causal CNN on top of softmax outputs reduces jitter and improves response latency under variable lighting. The real breakthro

Anxious days, sleepless nights for young Iranians in Hong Kong as war rages on
Life for Hong Kong-based Iranian biomedical researcher Behzad Nasiri Ahmadabadi is filled with anxiety as he spends each day waiting for a call from his family that may not come amid the conflict in the Middle East. The stress is similar for Iranian student Ali*, who spends his days scrolling through news reports from across the world to piece together events on the ground and lies awake at night thinking about what they mean. Young Iranians in Hong Kong are dealing with the conflict in a...


Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!