Perplexity has a handful of MIT licensed embedding models
submitted by /u/richardanaya [link] [comments]
Could not retrieve the full article text.
Read on Reddit r/LocalLLaMA →Reddit r/LocalLLaMA
https://www.reddit.com/r/LocalLLaMA/comments/1sdhquc/perplexity_has_a_handful_of_mit_licensed/Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modelperplexity
Qwen 27b and Other Dense Models Optimization
Hi All, I hadn't realized the kv cache quant made such a big difference, so I took my 64 gig mac M2 Max Studio and switched from Qwen 3.5 35b a3b to the dense 27b. I love it, it's a huge difference, but I get maybe 3 tokens a second. I have kv cache at q8, offload to gpu, flash attention, mmap, max concurrent 4, eval batch 2048, cpu set to 8, gpu offload full (64). I'm on LM Studios and run everything through Openclaw. Just wondering if there's anything I can do to speed it up. The output is wonderful, but man the slow speed causes some issues, especially for my scheduled jobs, even when I adjust them. If a heartbeat runs up against a regular message I'm f'd, Any tips would be greatly appreciated. submitted by /u/Jordanthecomeback [link] [comments]

Qwen 3.5 Tool Calling Fixes for Agentic Use: What's Broken, What's Fixed, What You (may) Still Need
Posted - What follows after this introduction is generated by Claude Opus 4.6 after hundreds of back and forths with log analysis for tool calls that were not working, and Qwen 3.5 models getting confused from local llm providers as well as Nano-Gpt. I fixed it for my own use with Pi coding agent at the time. Some of the fixes that were needed are no longer needed (TLDR at the bottom) but most are still applicable, as validated today. If you use Qwen 3.5 models and are having issues with model performance, tool calls, or general instability, the reference below might be a useful read. In the end, the fixes below on pi coding agent + llamacpp + Bartowski's quants (for stability) is what took my experience to 99% reliability and quality with all Qwen 3.5 models (Q5_k_L). Hope it helps someon
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models

Qwen 27b and Other Dense Models Optimization
Hi All, I hadn't realized the kv cache quant made such a big difference, so I took my 64 gig mac M2 Max Studio and switched from Qwen 3.5 35b a3b to the dense 27b. I love it, it's a huge difference, but I get maybe 3 tokens a second. I have kv cache at q8, offload to gpu, flash attention, mmap, max concurrent 4, eval batch 2048, cpu set to 8, gpu offload full (64). I'm on LM Studios and run everything through Openclaw. Just wondering if there's anything I can do to speed it up. The output is wonderful, but man the slow speed causes some issues, especially for my scheduled jobs, even when I adjust them. If a heartbeat runs up against a regular message I'm f'd, Any tips would be greatly appreciated. submitted by /u/Jordanthecomeback [link] [comments]

Show HN: Gemma Gem – AI model embedded in a browser – no API keys, no cloud
Gemma Gem is a Chrome extension that loads Google's Gemma 4 (2B) through WebGPU in an offscreen document and gives it tools to interact with any webpage: read content, take screenshots, click elements, type text, scroll, and run JavaScript. You get a small chat overlay on every page. Ask it about the page and it (usually) figures out which tools to call. It has a thinking mode that shows chain-of-thought reasoning as it works. It's a 2B model in a browser. It works for simple page questions and running JavaScript, but multi-step tool chains are unreliable and it sometimes ignores its tools entirely. The agent loop has zero external dependencies and can be extracted as a standalone library if anyone wants to experiment with it. Comments URL: https://news.ycombinator.com/item?id=47655367 Poi



Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!