Built a zero allocation, header only C++ Qwen tokenizer that is nearly 20x faster than openai Tiktoken

Reddit r/LocalLLaMAby /u/yassa9 https://www.reddit.com/user/yassa9April 3, 20261 min read1 views

I'm into HPC, and C++ static, zero allocation and zero dependancy software. I was studying BPE tokenizers, how do they work, so decided to build that project. I hardcoded qwen tokenizer for LLMs developers. I really know that whole Tokenization phase in llm inference is worth less than 2% of whole time, so practically negligible, but I just "love" to do that kind of programming, it's just an educational project for me to learn and build some intuition. Surprisingly after combining multiple different optimization techniques, it scored really high numbers in benchmarks. I thought it was a fluke at first, tried different tests, and so far it completely holds up. For a 12 threads Ryzen 5 3600 desktop CPU, 1 GB of English Text Corpus: - Mine Frokenizer: 1009 MB/s - OpenAI Tiktoken: ~ 50 MB/s Fo

Could not retrieve the full article text.

Read on Reddit r/LocalLLaMA →

Original source

Reddit r/LocalLLaMA

https://www.reddit.com/r/LocalLLaMA/comments/1sblae4/built_a_zero_allocation_header_only_c_qwen/

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

benchmarkstudygithub

Open Source AIFresh

viable/strict/1775476886: [vllm hash update] update the pinned vllm hash (#179439)

This PR is auto-generated nightly by this action . Update the pinned vllm hash. Pull Request resolved: #179439 Approved by: https://github.com/pytorchbot

PyTorch Releases

1mabout 9 hours ago

ModelsFresh

[Benchmark] Altered Riddles: Can LLMs ignore what they've memorised?

In the past year you may have encountered the following prompt: The surgeon, who is the boy's father, says, 'I cannot operate on this boy—he's my son!'. Who is the surgeon to the boy? If you try to give this prompt to an LLM right now you will probably still receive “The mother” as an answer, even though the text explicitly states that the surgeon is the boy’s father; this is probably due to the fact that this prompt is an alteration of a very common “riddle”, to which the answer is, in fact, the mother: A man and his son are in a terrible accident and are rushed to the hospital in critical condition. The doctor looks at the boy and exclaims, "I can't operate on this boy; he's my son!" How could this be? Working on this failure mode, I initially decided to create a small dataset of altered

Reddit r/LocalLLaMA

2mabout 6 hours ago

ReleasesFresh

An Empirical Study of Testing Practices in Open Source AI Agent Frameworks and Agentic Applications

arXiv:2509.19185v3 Announce Type: replace Abstract: Foundation model (FM)-based AI agents are rapidly gaining adoption across diverse domains, but their inherent non-determinism and non-reproducibility pose testing and quality assurance challenges. While recent benchmarks provide task-level evaluations, there is limited understanding of how developers verify the internal correctness of these agents during development. To address this gap, we conduct the first large-scale empirical study of testing practices in the AI agent ecosystem, analyzing 39 open-source agent frameworks and 439 agentic applications. We identify ten distinct testing patterns and find that novel, agent-specific methods like DeepEval are seldom used (around 1%), while traditional patterns like negative and membership tes

arXiv cs.SE

2mabout 10 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 197 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Models

Models

Google just launched Lyria 3 - its ‘most advanced’ AI music generator yet - in the Gemini app - Music Business Worldwide

Google just launched Lyria 3 - its ‘most advanced’ AI music generator yet - in the Gemini app Music Business Worldwide

GNews AI music

1mabout 2 months ago

ModelsFresh

Failure Mechanisms and Risk Estimation for Legged Robot Locomotion on Granular Slopes

arXiv:2603.06928v2 Announce Type: replace Abstract: Locomotion on granular slopes such as sand dunes remains a fundamental challenge for legged robots due to reduced shear strength and gravity-induced anisotropic yielding of granular media. Using a hexapedal robot on a tiltable granular bed, we systematically measure locomotion speed together with slope-dependent normal and shear granular resistive forces. While normal penetration resistance remains nearly unchanged with inclination, shear resistance decreases substantially as slope angle increases. Guided by these measurements, we develop a simple robot-terrain interaction model that predicts anchoring timing, step length, and resulting robot speed, as functions of terrain strength and slope angle. The model reveals that slope-induced per

arXiv cs.RO

1mabout 10 hours ago

ModelsFresh

trunk/9bd2effa70e39d2ae4f078caadb59b53db21e735

Add append_system_prompt to Claude Code workflow for PR reviews ( #179 …

PyTorch Releases

1mabout 4 hours ago

ModelsFresh

qwen3.5 vs gemma4 vs cloud llms in python turtle

I have found python turtle to be a pretty good test for a model. All of these models have received the same prompt: "write a python turtle program that draws a cat" you can actually see similarity in gemma's and gemini pro's outputs, they share the color pallete and minimalist approach in terms of details. I have a 16 gb vram gpu so couldn't test bigger versions of qwen and gemma without quantisation. gemma_4_31B_it_UD_IQ3_XXS.gguf Qwen3_5_9B_Q8_0.gguf Qwen_3_5_27B_Opus_Distilled_Q4_K_S.gguf deepseek from web browser with reasoning claude sonnet 4.6 extended gemini pro from web browser with thinking submitted by /u/SirKvil [link] [comments]

Reddit r/LocalLLaMA

1mabout 4 hours ago