Why SOC analysts get inconsistent results from ChatGPT (and how structured workflows fix it)

DEV Communityby gaurav kunduApril 2, 20262 min read1 views

<p>If you've ever handed a security alert to ChatGPT and gotten a different answer each time — you've hit the real problem.</p> <p>It's not the model. It's the prompt.</p> <p>Most analysts paste an alert and ask "what do you think?" That's like asking a junior analyst to investigate without a runbook. You'll get something back, but the quality depends entirely on how the question was framed.</p> <h2> The real problem: no structure </h2> <p>Experienced SOC analysts don't wing investigations. They follow a process:</p> <ul> <li>Triage the alert</li> <li>Map to MITRE ATT&CK</li> <li>Check for lateral movement</li> <li>Build a containment recommendation</li> <li>Write a ticket summary</li> </ul> <p>The issue is that most AI-assisted workflows skip steps 2–5 and jump straight to "is this ba

If you've ever handed a security alert to ChatGPT and gotten a different answer each time — you've hit the real problem.

It's not the model. It's the prompt.

Most analysts paste an alert and ask "what do you think?" That's like asking a junior analyst to investigate without a runbook. You'll get something back, but the quality depends entirely on how the question was framed.

The real problem: no structure

Experienced SOC analysts don't wing investigations. They follow a process:

Triage the alert
Map to MITRE ATT&CK
Check for lateral movement
Build a containment recommendation
Write a ticket summary

The issue is that most AI-assisted workflows skip steps 2–5 and jump straight to "is this bad?"

What I built

I spent time building SOC.Workflows — a free collection of structured investigation workflows for SOC analysts. Each workflow breaks an investigation into 4 steps, with specific prompts for each step, designed to run in ChatGPT or Claude.

Current workflows:

Phishing Email Investigation
AWS VPC Flow Log Analysis
PowerShell & Script Analysis
Credential Dumping Investigation
Ransomware Triage
Identity Compromise Investigation
URL & Domain Analysis
SOC Alert Triage
Explain This Alert

How it works

Pick a workflow matching your alert type
Copy the workflow prompt
Paste into ChatGPT or Claude
Get structured, step-by-step analysis

No login. No setup. No API keys.

Why structure matters

When I ran the same phishing alert through an unstructured prompt vs. the structured workflow, the difference was clear:

Unstructured: "This looks like a phishing email. Check the sender domain."

Structured: SPF/DKIM validation → header analysis → sender reputation → verdict with risk score → recommended response actions

Same model. Completely different output quality.

Try it

If you work in a SOC or do blue team work, I'd love feedback on which investigation types are missing.

👉 socworkflows.com — free, no login required

Original source

DEV Community

https://dev.to/gaurav_kundu_c6eee7120819/why-soc-analysts-get-inconsistent-results-from-chatgpt-and-how-structured-workflows-fix-it-24mb

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

claudemodelanalysis

Models

Dutch air force reads pilots brainwaves to make training harder

While pilots are flying in a VR simulation, their brainwave patterns can be fed into an AI model that assesses how challenging they are finding a task and adjusts the difficulty accordingly

New Scientist Tech

1mabout 2 months ago

Models

A social network for AI looks disturbing, but it s not what you think

A social network where humans are banned and AI models talk openly of world domination has led to claims that the "singularity" has begun, but the truth is that much of the content is written by humans

New Scientist Tech

1mabout 2 months ago

ModelsLive

[P] Trained a small BERT on 276K Kubernetes YAMLs using tree positional encoding instead of sequential

I trained a BERT-style transformer on 276K Kubernetes YAML files, replacing standard positional encoding with learned tree coordinates (depth, sibling index, node type). The model uses hybrid bigram/trigram prediction targets to learn both universal structure and kind-specific patterns — 93/93 capability tests passing. Interesting findings: learned depth embeddings are nearly orthogonal (categorical, not smooth like sine/cosine), and 28/48 attention heads specialize on same-depth attention (up to 14.5x bias). GitHub: https://github.com/vimalk78/yaml-bert submitted by /u/vimalk78 [link] [comments]

Reddit r/MachineLearning

1mabout 1 hour ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 217 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Models

Models

Dutch air force reads pilots brainwaves to make training harder

While pilots are flying in a VR simulation, their brainwave patterns can be fed into an AI model that assesses how challenging they are finding a task and adjusts the difficulty accordingly

New Scientist Tech

1mabout 2 months ago

Models

A social network for AI looks disturbing, but it s not what you think

New Scientist Tech

1mabout 2 months ago

ModelsLive

[P] Trained a small BERT on 276K Kubernetes YAMLs using tree positional encoding instead of sequential

Reddit r/MachineLearning

1mabout 1 hour ago

ModelsLive

Avoid Re-encoding Reference Images in Vision-LLM When Comparison Criteria Are User-Defined

Hi everyone, I’m working with a Vision-LLM (like Qwen-VL / LLaVA / llama.cpp-based multimodal models) where I need to compare new images against reference images. The key part of my use case is that users define the comparison criteria (e.g., fur length, ear shape, color patterns), and I’m using image-to-text models to evaluate how well a new image matches a reference according to these criteria. Currently, every time I send a prompt including the reference images, the model re-encodes them from scratch . From the logs, I can see: llama-server encoding image slice... image slice encoded in 3800–4800 ms decoding image batch ... Even for the same reference images, this happens every single request , which makes inference slow. Questions: Has anyone dealt with user-defined comparison criteria

discuss.huggingface.co

1mabout 1 hour ago