research-llm-apis 2026-04-04
Release: research-llm-apis 2026-04-04 I'm working on a major change to my LLM Python library and CLI tool. LLM provides an abstraction layer over hundreds of different LLMs from dozens of different vendors thanks to its plugin system, and some of those vendors have grown new features over the past year which LLM's abstraction layer can't handle, such as server-side tool execution. To help design that new abstraction layer I had Claude Code read through the Python client libraries for Anthropic, OpenAI, Gemini and Mistral and use those to help craft curl commands to access the raw JSON for both streaming and non-streaming modes across a range of different scenarios. Both the scripts and the captured outputs now live in this new repo. Tags: llm , apis , json , llms
I'm working on a major change to my LLM Python library and CLI tool. LLM provides an abstraction layer over hundreds of different LLMs from dozens of different vendors thanks to its plugin system, and some of those vendors have grown new features over the past year which LLM's abstraction layer can't handle, such as server-side tool execution.
To help design that new abstraction layer I had Claude Code read through the Python client libraries for Anthropic, OpenAI, Gemini and Mistral and use those to help craft curl commands to access the raw JSON for both streaming and non-streaming modes across a range of different scenarios. Both the scripts and the captured outputs now live in this new repo.
Simon Willison Blog
https://simonwillison.net/2026/Apr/5/research-llm-apis/#atom-everythingSign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
claudegeminimistral
What happened to MLX-LM? What are the alternatives?
Support seems non-existent and the last proper release was over a month ago. Comparing with llama.cpp, they are just miles different in activity and support. Is there an alternative or should just use llama.cpp for my macbook? submitted by /u/Solus23451 [link] [comments]

Qodo vs Tabnine: AI Coding Assistants Compared (2026)
Quick Verdict Qodo and Tabnine address genuinely different problems. Qodo is a code quality specialist - its entire platform is built around making PRs better through automated review and test generation. Tabnine is a privacy-first code assistant - its entire platform is built around delivering AI coding help in environments where data sovereignty cannot be compromised. Choose Qodo if: your team needs the deepest available AI PR review, you want automated test generation that proactively closes coverage gaps, you use GitLab or Azure DevOps alongside GitHub, or you want the open-source transparency of PR-Agent as your review foundation. Choose Tabnine if: your team needs AI code completion as a primary feature, your organization requires on-premise or fully air-gapped deployment with battle
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models

I wrote a fused MoE dispatch kernel in pure Triton that beats Megablocks on Mixtral and DeepSeek at inference batch sizes
Been working on custom Triton kernels for LLM inference for a while. My latest project: a fused MoE dispatch pipeline that handles the full forward pass in 5 kernel launches instead of 24+ in the naive approach. Results on Mixtral-8x7B (A100): Tokens vs PyTorch vs Megablocks 32 4.9x 131% 128 5.8x 124% 512 6.5x 89% At 32 and 128 tokens (where most inference serving actually happens), it's faster than Stanford's CUDA-optimized Megablocks. At 512+ Megablocks pulls ahead with its hand-tuned block-sparse matmul. The key trick is fusing the gate+up projection so both GEMMs share the same input tile from L2 cache, and the SiLU activation happens in registers without ever hitting global memory. Saves ~470MB of memory traffic per forward pass on Mixtral. Also tested on DeepSeek-V3 (256 experts) and




Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!