Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessI Brute-Forced 2 Million Hashes to Get a Shiny Legendary Cat in My Terminal. It Has Max SNARK and a Propeller Hat.DEV CommunityHave to do enough for my talk, "Is AI Getting Reports Wrong? Try Google LookML, Your Data Dictionary!" at Google NEXT 2026DEV CommunityTaming the Ingredient Sourcing Nightmare with AI AutomationDEV Community# 🚀 How to Build a High-Performance Landing Page with Next.js 15 and Tailwind v4DEV CommunityClaude Code Architecture Explained: Agent Loop, Tool System, and Permission Model (Rust Rewrite Analysis)DEV CommunityThe Data Structure That's Okay With Being WrongDEV CommunityHow to Auto-Index Your URLs with Google Search Console APIDEV CommunityThe Indestructible FutureLessWrong AIBuilding Real-Time Features in React Without WebSocket LibrariesDEV CommunityChatGPT Maker OpenAI Valued at $852B After Record $122B Funding Round - Bitcoin.com NewsGoogle News: ChatGPTParameter Count Is the Worst Way to Pick a Model on 8GB VRAMDEV CommunityTreeline, which is building an AI and software-first alternative to legacy corporate IT systems, raised a $25M Series A led by Andreessen Horowitz (Lily Mae Lazarus/Fortune)TechmemeBlack Hat USADark ReadingBlack Hat AsiaAI BusinessI Brute-Forced 2 Million Hashes to Get a Shiny Legendary Cat in My Terminal. It Has Max SNARK and a Propeller Hat.DEV CommunityHave to do enough for my talk, "Is AI Getting Reports Wrong? Try Google LookML, Your Data Dictionary!" at Google NEXT 2026DEV CommunityTaming the Ingredient Sourcing Nightmare with AI AutomationDEV Community# 🚀 How to Build a High-Performance Landing Page with Next.js 15 and Tailwind v4DEV CommunityClaude Code Architecture Explained: Agent Loop, Tool System, and Permission Model (Rust Rewrite Analysis)DEV CommunityThe Data Structure That's Okay With Being WrongDEV CommunityHow to Auto-Index Your URLs with Google Search Console APIDEV CommunityThe Indestructible FutureLessWrong AIBuilding Real-Time Features in React Without WebSocket LibrariesDEV CommunityChatGPT Maker OpenAI Valued at $852B After Record $122B Funding Round - Bitcoin.com NewsGoogle News: ChatGPTParameter Count Is the Worst Way to Pick a Model on 8GB VRAMDEV CommunityTreeline, which is building an AI and software-first alternative to legacy corporate IT systems, raised a $25M Series A led by Andreessen Horowitz (Lily Mae Lazarus/Fortune)Techmeme

Monitor Insider Trading Without Parsing SEC XML — Form 4 Data as Clean JSON

DEV Communityby lulzasaurApril 1, 20267 min read0 views
Source Quiz

<h1> Monitor Insider Trading Without Parsing SEC XML — Form 4 Data as Clean JSON </h1> <p>SEC Form 4 filings are one of the most useful public datasets for tracking what company insiders (CEOs, directors, 10% owners) are doing with their stock. When a CEO buys 50,000 shares of their own company, that's a signal.</p> <p>The problem: getting this data from SEC EDGAR is genuinely painful.</p> <h2> The EDGAR Problem </h2> <p>EDGAR serves filings as nested XML/SGML documents. There's no proper REST API for structured Form 4 data. Here's what you're dealing with:</p> <ol> <li> <strong>CIK-based lookups</strong>: You need to map ticker symbols to CIK numbers (EDGAR's internal ID system)</li> <li> <strong>Full-text search returns everything</strong>: Searching for "AAPL" returns all 100+ filing ty

Monitor Insider Trading Without Parsing SEC XML — Form 4 Data as Clean JSON

SEC Form 4 filings are one of the most useful public datasets for tracking what company insiders (CEOs, directors, 10% owners) are doing with their stock. When a CEO buys 50,000 shares of their own company, that's a signal.

The problem: getting this data from SEC EDGAR is genuinely painful.

The EDGAR Problem

EDGAR serves filings as nested XML/SGML documents. There's no proper REST API for structured Form 4 data. Here's what you're dealing with:

  • CIK-based lookups: You need to map ticker symbols to CIK numbers (EDGAR's internal ID system)

  • Full-text search returns everything: Searching for "AAPL" returns all 100+ filing types — 10-Ks, 8-Ks, proxies — not just insider trades

  • XML parsing: Form 4 filings use XBRL/XML with nested schemas. The actual transaction data is buried inside and elements

  • No pagination or filtering: EDGAR's XBRL feeds dump everything. You build the filtering logic

Most developers spend 2-3 days just getting the XML parsing right before they can extract a single transaction.

Skip the XML

I built an API that does all the EDGAR parsing and returns clean JSON. Here's what a Form 4 query looks like:

curl "https://your-api/sec/insider-trades?ticker=AAPL&limit=5"

Enter fullscreen mode

Exit fullscreen mode

{  "success": true,  "total": 1542,  "filings": [  {  "accession": "0000950123-26-001456",  "filedDate": "2026-03-31",  "periodOfReport": "2026-03-27",  "insider": {  "name": "Cook Timothy D",  "cik": "0001234567"  },  "company": {  "name": "APPLE INC",  "cik": "0000320193"  }  }  ] }

Enter fullscreen mode

Exit fullscreen mode

Get full transaction details for any filing:

curl "https://your-api/sec/insider-trades/filing/0000950123-26-001456"

Enter fullscreen mode

Exit fullscreen mode

{  "issuer": {  "cik": "0000320193",  "name": "APPLE INC",  "ticker": "AAPL"  },  "owner": {  "name": "Cook Timothy D",  "isDirector": true,  "isOfficer": true,  "title": "Chief Executive Officer"  },  "transactions": [  {  "security": "Common Stock",  "date": "2026-03-25",  "code": "P",  "codeLabel": "Purchase",  "shares": 50000,  "pricePerShare": 178.50,  "sharesAfter": 3500000,  "ownership": "direct"  }  ] }

Enter fullscreen mode

Exit fullscreen mode

Ticker-based search (no CIK mapping needed), pre-filtered to Form 4 only, transactions parsed into flat JSON.

Building an Insider Trading Monitor

Here's a Python script that checks for insider purchases above $100K:

import requests from datetime import datetime, timedelta

API_URL = "https://your-api/sec/insider-trades" API_KEY = "your-api-key" WATCHLIST = ["AAPL", "MSFT", "GOOGL", "AMZN", "TSLA"]

headers = {"X-Api-Key": API_KEY} yesterday = (datetime.now() - timedelta(days=1)).strftime("%Y-%m-%d")

for ticker in WATCHLIST: resp = requests.get( API_URL, params={"ticker": ticker, "startDate": yesterday}, headers=headers, ) data = resp.json()

for filing in data.get("filings", []):

Get full transaction details

detail = requests.get( f"{API_URL}/filing/{filing['accession']}", headers=headers, ).json()

for tx in detail.get("transactions", []): if tx["code"] == "P": # Purchase value = tx["shares"] * tx["pricePerShare"] if value > 100_000: print( f"🚨 {detail['owner']['name']} " f"({detail['owner'].get('title', 'Insider')}) " f"bought {tx['shares']:,} shares of {ticker} " f"at ${tx['pricePerShare']:.2f} " f"(${value:,.0f} total)" )`*

Enter fullscreen mode

Exit fullscreen mode

Run this on a daily cron and you've got an insider trading alert system. The code is ~30 lines because all the hard work (XML parsing, CIK resolution, filing filtering) is handled by the API.

Transaction Codes Explained

Form 4 transactions use single-letter codes:

Code Meaning Signal

P Open market purchase Bullish — insider buying with own money

S Open market sale Could be planned (10b5-1) or discretionary

A Grant/award Compensation, not a market signal

M Option exercise Converting options to shares

F Tax withholding Automatic, not discretionary

G Gift Estate planning, not a trading signal

The most interesting transactions are P (purchases) and S (sales) — these represent discretionary decisions by insiders.

What You'd Build Without This

For context, here's what the DIY version looks like:

  • Build a CIK-to-ticker mapping table (SEC provides a bulk file, ~13,000 companies)

  • Write an EDGAR full-text search query parser

  • Build XML/SGML parsers for Form 4 documents (two different schemas depending on filing date)

  • Handle XBRL footnotes, amendments (Form 4/A), and derivative transactions

  • Implement rate limiting (SEC throttles to 10 req/sec with a required User-Agent header)

  • Build storage and deduplication logic

That's a week of work minimum, plus ongoing maintenance when EDGAR changes their schema.

I vibe coded this for my own trading research. It's on RapidAPI — free tier if you want to poke around.

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

versioncompanymarket

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Monitor Ins…versioncompanymarketstockreportagentDEV Communi…

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 231 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Releases

缓存架构深度指南:如何设计高性能缓存系统
ReleasesFresh

缓存架构深度指南:如何设计高性能缓存系统

<h1> 缓存架构深度指南:如何设计高性能缓存系统 </h1> <blockquote> <p>在现代分布式系统中,缓存是提升系统性能的核心组件。本文将深入探讨缓存架构的设计原则、策略与实战技巧。</p> </blockquote> <h2> 为什么要使用缓存? </h2> <p>在软件系统中,缓存的本质是<strong>用空间换时间</strong>。通过将频繁访问的数据存储在高速存储介质中,减少对慢速数据源的访问次数,从而显著提升系统响应速度。</p> <p>典型场景:</p> <ul> <li>数据库查询结果缓存</li> <li>API响应缓存</li> <li>会话状态缓存</li> <li>计算结果缓存</li> </ul> <h2> 缓存架构设计原则 </h2> <h3> 1. 缓存层级策略 </h3> <p>现代系统通常采用多级缓存架构:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight plaintext"><code>┌─────────────────────────────────────────────┐ │ CDN (边缘缓存) │ ├─────────────────────────────────────────────┤ │ Redis/Memcached │ ├─────────────────────────────────────────────┤ │ 本地缓存 │ ├─────────────────────────────────────────────┤ │ 数据库 │ └─────────────────────────────────────────────┘ </code></pre> </div> <p><strong>原则<