Releases version company market stock report agent

Monitor Insider Trading Without Parsing SEC XML — Form 4 Data as Clean JSON

DEV Communityby lulzasaurApril 1, 20267 min read0 views

<h1> Monitor Insider Trading Without Parsing SEC XML — Form 4 Data as Clean JSON </h1> <p>SEC Form 4 filings are one of the most useful public datasets for tracking what company insiders (CEOs, directors, 10% owners) are doing with their stock. When a CEO buys 50,000 shares of their own company, that's a signal.</p> <p>The problem: getting this data from SEC EDGAR is genuinely painful.</p> <h2> The EDGAR Problem </h2> <p>EDGAR serves filings as nested XML/SGML documents. There's no proper REST API for structured Form 4 data. Here's what you're dealing with:</p> <ol> <li> <strong>CIK-based lookups</strong>: You need to map ticker symbols to CIK numbers (EDGAR's internal ID system)</li> <li> <strong>Full-text search returns everything</strong>: Searching for "AAPL" returns all 100+ filing ty

Monitor Insider Trading Without Parsing SEC XML — Form 4 Data as Clean JSON

SEC Form 4 filings are one of the most useful public datasets for tracking what company insiders (CEOs, directors, 10% owners) are doing with their stock. When a CEO buys 50,000 shares of their own company, that's a signal.

The problem: getting this data from SEC EDGAR is genuinely painful.

The EDGAR Problem

EDGAR serves filings as nested XML/SGML documents. There's no proper REST API for structured Form 4 data. Here's what you're dealing with:

CIK-based lookups: You need to map ticker symbols to CIK numbers (EDGAR's internal ID system)
Full-text search returns everything: Searching for "AAPL" returns all 100+ filing types — 10-Ks, 8-Ks, proxies — not just insider trades
XML parsing: Form 4 filings use XBRL/XML with nested schemas. The actual transaction data is buried inside and elements
No pagination or filtering: EDGAR's XBRL feeds dump everything. You build the filtering logic

Most developers spend 2-3 days just getting the XML parsing right before they can extract a single transaction.

Skip the XML

I built an API that does all the EDGAR parsing and returns clean JSON. Here's what a Form 4 query looks like:

curl "https://your-api/sec/insider-trades?ticker=AAPL&limit=5"

Enter fullscreen mode

Exit fullscreen mode

{  "success": true,  "total": 1542,  "filings": [  {  "accession": "0000950123-26-001456",  "filedDate": "2026-03-31",  "periodOfReport": "2026-03-27",  "insider": {  "name": "Cook Timothy D",  "cik": "0001234567"  },  "company": {  "name": "APPLE INC",  "cik": "0000320193"  }  }  ] }

{  "success": true,  "total": 1542,  "filings": [  {  "accession": "0000950123-26-001456",  "filedDate": "2026-03-31",  "periodOfReport": "2026-03-27",  "insider": {  "name": "Cook Timothy D",  "cik": "0001234567"  },  "company": {  "name": "APPLE INC",  "cik": "0000320193"  }  }  ] }

Enter fullscreen mode

Exit fullscreen mode

Get full transaction details for any filing:

curl "https://your-api/sec/insider-trades/filing/0000950123-26-001456"

Enter fullscreen mode

Exit fullscreen mode

{  "issuer": {  "cik": "0000320193",  "name": "APPLE INC",  "ticker": "AAPL"  },  "owner": {  "name": "Cook Timothy D",  "isDirector": true,  "isOfficer": true,  "title": "Chief Executive Officer"  },  "transactions": [  {  "security": "Common Stock",  "date": "2026-03-25",  "code": "P",  "codeLabel": "Purchase",  "shares": 50000,  "pricePerShare": 178.50,  "sharesAfter": 3500000,  "ownership": "direct"  }  ] }

{  "issuer": {  "cik": "0000320193",  "name": "APPLE INC",  "ticker": "AAPL"  },  "owner": {  "name": "Cook Timothy D",  "isDirector": true,  "isOfficer": true,  "title": "Chief Executive Officer"  },  "transactions": [  {  "security": "Common Stock",  "date": "2026-03-25",  "code": "P",  "codeLabel": "Purchase",  "shares": 50000,  "pricePerShare": 178.50,  "sharesAfter": 3500000,  "ownership": "direct"  }  ] }

Enter fullscreen mode

Exit fullscreen mode

Ticker-based search (no CIK mapping needed), pre-filtered to Form 4 only, transactions parsed into flat JSON.

Building an Insider Trading Monitor

Here's a Python script that checks for insider purchases above $100K:

import requests from datetime import datetime, timedelta

import requests from datetime import datetime, timedelta

API_URL = "https://your-api/sec/insider-trades" API_KEY = "your-api-key" WATCHLIST = ["AAPL", "MSFT", "GOOGL", "AMZN", "TSLA"]

headers = {"X-Api-Key": API_KEY} yesterday = (datetime.now() - timedelta(days=1)).strftime("%Y-%m-%d")

for ticker in WATCHLIST: resp = requests.get( API_URL, params={"ticker": ticker, "startDate": yesterday}, headers=headers, ) data = resp.json()

for filing in data.get("filings", []):

Get full transaction details

detail = requests.get( f"{API_URL}/filing/{filing['accession']}", headers=headers, ).json()

for tx in detail.get("transactions", []): if tx["code"] == "P": # Purchase value = tx["shares"] * tx["pricePerShare"] if value > 100_000: print( f"🚨 {detail['owner']['name']} " f"({detail['owner'].get('title', 'Insider')}) " f"bought {tx['shares']:,} shares of {ticker} " f"at ${tx['pricePerShare']:.2f} " f"(${value:,.0f} total)" )`*

Enter fullscreen mode

Exit fullscreen mode

Run this on a daily cron and you've got an insider trading alert system. The code is ~30 lines because all the hard work (XML parsing, CIK resolution, filing filtering) is handled by the API.

Transaction Codes Explained

Form 4 transactions use single-letter codes:

Code Meaning Signal

P Open market purchase Bullish — insider buying with own money

S Open market sale Could be planned (10b5-1) or discretionary

A Grant/award Compensation, not a market signal

M Option exercise Converting options to shares

F Tax withholding Automatic, not discretionary

G Gift Estate planning, not a trading signal

The most interesting transactions are P (purchases) and S (sales) — these represent discretionary decisions by insiders.

What You'd Build Without This

For context, here's what the DIY version looks like:

Build a CIK-to-ticker mapping table (SEC provides a bulk file, ~13,000 companies)
Write an EDGAR full-text search query parser
Build XML/SGML parsers for Form 4 documents (two different schemas depending on filing date)
Handle XBRL footnotes, amendments (Form 4/A), and derivative transactions
Implement rate limiting (SEC throttles to 10 req/sec with a required User-Agent header)
Build storage and deduplication logic

That's a week of work minimum, plus ongoing maintenance when EDGAR changes their schema.

I vibe coded this for my own trading research. It's on RapidAPI — free tier if you want to poke around.

Original source

DEV Community

https://dev.to/lulzasaur/monitor-insider-trading-without-parsing-sec-xml-form-4-data-as-clean-json-3ome

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

versioncompanymarket

ModelsLive

My most common advice for junior researchers

Written quickly as part of the Inkhaven Fellowship . At a high level, research feedback I give to more junior research collaborators often can fall into one of three categories: Doing quick sanity checks Saying precisely what you want to say Asking why one more time In each case, I think the advice can be taken to an extreme I no longer endorse. Accordingly, I’ve tried to spell out the degree to which you should implement the advice, as well as what “taking it too far” might look like. This piece covers doing quick sanity checks, which is the most common advice I give to junior researchers. I’ll cover the other two pieces of advice in a subsequent piece. Doing quick sanity checks Research is hard (almost by definition) and people are often wrong. Every researcher has wasted countless hours

LessWrong AI

7mabout 1 hour ago

ProductsLive

Open Source Project of the Day (Part 27): Awesome AI Coding - A One-Stop AI Programming Resource Navigator

<h2> Introduction </h2> <blockquote> <p>"AI coding tools and resources are scattered everywhere. A topically organized, searchable, contributable list can save enormous amounts of search time."</p> </blockquote> <p>This is Part 27 of the "Open Source Project of the Day" series. Today we explore <strong>Awesome AI Coding</strong> (<a href="https://github.com/chendongqi/awesome-ai-coding" rel="noopener noreferrer">GitHub</a>).</p> <p>When doing AI-assisted programming, you'll face questions like: which editor or terminal tool should I use? For multi-agent frameworks, should I pick MetaGPT or CrewAI? What RAG frameworks and vector databases are available? Where do I find MCP servers? What ready-made templates are there for Claude Code Rules and Skills? <strong>Awesome AI Coding</strong> is ex

DEV Community

11mabout 1 hour ago

ProductsLive

Claude Code Architecture Explained: Agent Loop, Tool System, and Permission Model (Rust Rewrite Analysis)

<h2> Claude Code Deep Dive (Part 1): Architecture Overview and the Core Agent Loop </h2> <p>Claude Code’s leaked source code weighs in at over <strong>510,000 lines of TypeScript</strong>—far too large to analyze directly.</p> <p>Interestingly, a community-driven Rust rewrite reduced that complexity to around <strong>20,000 lines</strong>, while still preserving the core functionality.</p> <p>Starting from this simplified version makes one thing much clearer:</p> <blockquote> <p>What does an AI agent system <em>actually need</em> to work?</p> </blockquote> <h2> Why Start with the Rust Rewrite? </h2> <p>On March 31, 2026, Claude Code’s full source was unintentionally exposed due to an npm packaging mistake.</p> <p>The package <code>@anthropic-ai/claude-code v2.1.88</code> included a <strong

DEV Community

7m25 minutes ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 231 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Releases

ReleasesFresh

缓存架构深度指南：如何设计高性能缓存系统

<h1> 缓存架构深度指南：如何设计高性能缓存系统 </h1> <blockquote> <p>在现代分布式系统中，缓存是提升系统性能的核心组件。本文将深入探讨缓存架构的设计原则、策略与实战技巧。</p> </blockquote> <h2> 为什么要使用缓存？ </h2> <p>在软件系统中，缓存的本质是<strong>用空间换时间</strong>。通过将频繁访问的数据存储在高速存储介质中，减少对慢速数据源的访问次数，从而显著提升系统响应速度。</p> <p>典型场景：</p> <ul> <li>数据库查询结果缓存</li> <li>API响应缓存</li> <li>会话状态缓存</li> <li>计算结果缓存</li> </ul> <h2> 缓存架构设计原则 </h2> <h3> 1. 缓存层级策略 </h3> <p>现代系统通常采用多级缓存架构：<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight plaintext"><code>┌─────────────────────────────────────────────┐ │ CDN (边缘缓存) │ ├─────────────────────────────────────────────┤ │ Redis/Memcached │ ├─────────────────────────────────────────────┤ │ 本地缓存 │ ├─────────────────────────────────────────────┤ │ 数据库 │ └─────────────────────────────────────────────┘ </code></pre> </div> <p><strong>原则<

DEV Community

3mabout 3 hours ago

ReleasesFresh

How to Use the ES2026 Temporal API in Node.js REST APIs (2026 Guide)

<p>After 9 years in development and countless TC39 meetings, the JavaScript Temporal API officially reached <strong>Stage 4 on March 11, 2026</strong>, locking it into the ES2026 specification. That means it's no longer a proposal — it's the future of date and time handling in JavaScript, and you should start using it in your Node.js APIs today.</p> <p>If you've ever shipped a date-related bug in production — DST edge cases, wrong timezone conversions, silent mutation bugs from <code>Date.setDate()</code> — you're not alone. The <code>Date</code> object was designed in 1995, copied from Java, and has been causing developer pain ever since. Temporal is the fix.</p> <p>This guide covers <strong>how to use the ES2026 Temporal API in Node.js REST APIs</strong> with practical, real-world patter

DEV Community

16mabout 2 hours ago

ReleasesFresh

Axios Hijack Post-Mortem: How to Audit, Pin, and Automate a Defense

<p>On March 31, 2026, the <code>axios</code> npm package was compromised via a hijacked maintainer account. Two versions, <code>1.14.1</code> and <code>0.30.4</code>, were weaponised with a malicious phantom dependency called <code>plain-crypto-js</code>. It functions as a Remote Access Trojan (RAT) that executes during the <code>postinstall</code> phase and silently exfiltrates environment variables: AWS keys, GitHub tokens, database credentials, and anything present in your <code>.env</code> at install time.</p> <p>The attack window was approximately 3 hours (00:21 to 03:29 UTC) before the packages were unpublished. A single CI run during that window is sufficient exposure.<br> This post documents the forensic audit and remediation steps performed on a Next.js production stack immediatel

DEV Community

10mabout 2 hours ago

ReleasesFresh

Guilford Technical CC to Launch Degrees in AI, Digital Media - govtech.com

<a href="https://news.google.com/rss/articles/CBMipgFBVV95cUxQOXdfNFpXQjJyRlo4aTA1cjdwZk5IbTNTNi1BU25hQUNlSjVXcE5ZelJNbFRMYUZsVFNWZ3lxX21TQ3NocHdLbldydkR0Q1JURXR5eVhXd3ItNjlJcE1TdHFPMnA1c0FQWDBmbWtNRC04YWRIelU5LWU3Rl9ZWHctYU02d2M4WHJ5a2pwaW0xcTRyNkVqSThhNkNxbFlZSkF4Q2tIZHNn?oc=5" target="_blank">Guilford Technical CC to Launch Degrees in AI, Digital Media</a> <font color="#6f6f6f">govtech.com</font>

Google News: AI

1mabout 7 hours ago