Early-fusion hybrid CNN-transformer models for multiclass ovarian tumor ultrasound classification - Frontiers

More about

modeltransformer

Anyone else notice qwen 3.5 is a lying little shit

Any time I catch it messing up it just lies and tries to hide it’s mistakes . This is the 1st model I’m caught doing this multiple times. I’m have llms hallucinate or be just completely wrong but qwen will say it did something, I call it out then it goes and double downs on its lie “I did do it like you asked “ and when I call it out it 1/2 admits to being wrong. It’s kinda funny how much it doesn’t want to admit it didn’t do what it was supposed to. submitted by /u/Cat5edope [link] [comments]

Reddit r/LocalLLaMA

1mabout 20 hours ago

Open Source AIRecent

APEX MoE quantized models boost with 33% faster inference and TurboQuant (14% of speedup in prompt processing)

I've just released APEX (Adaptive Precision for EXpert Models): a novel MoE quantization technique that outperforms Unsloth Dynamic 2.0 on accuracy while being 2x smaller for MoE architectures. Benchmarked on Qwen3.5-35B-A3B, but the method applies to any MoE model. Half the size of Q8. Perplexity comparable to F16. Works with stock llama.cpp with no patches. Open source (of course!), with github.com/mudler/LocalAI team! https://preview.redd.it/uv2bnfheymsg1.jpg?width=1632 format=pjpg auto=webp s=3eca979e8f9ca6b75d206eecdf29308b74aed530 Perplexity by itself doesn't say the full story. KL divergence tells a story perplexity doesn't: https://preview.redd.it/jn9ua2ksymsg1.jpg?width=1617 format=pjpg auto=webp s=7df969308e10aa6b6d31098c92fca1c14bb42a40 Tiers for every GPU: - I-Quality: 21.3 GB

Reddit r/LocalLLaMA

1mabout 13 hours ago

ModelsLive

Running SmolLM2‑360M on a Samsung Galaxy Watch 4 (380MB RAM) – 74% RAM reduction in llama.cpp

I’ve got SmolLM2‑360M running on a Samsung Galaxy Watch 4 Classic (about 380MB free RAM) by tweaking llama.cpp and the underlying ggml memory model. By default, the model was being loaded twice in RAM: once via the APK’s mmap page cache and again via ggml’s tensor allocations, peaking at 524MB for a 270MB model. The fix: I pass host_ptr into llama_model_params , so CPU tensors point directly into the mmap region and only Vulkan tensors are copied. On real hardware this gives: Peak RAM: 524MB → 142MB (74% reduction) First boot: 19s → 11s Second boot: ~2.5s (mmap + KV cache warm) Code: https://github.com/Perinban/llama.cpp/tree/axon‑dev Longer write‑up with VmRSS traces and design notes: https://www.linkedin.com/posts/perinban-parameshwaran_machinelearning-llm-embeddedai-activity-74453741179

Reddit r/LocalLLaMA

1mabout 1 hour ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 171 connections

Scroll to zoom · drag to pan · click to open

More in Models

ModelsRecent

Anyone else notice qwen 3.5 is a lying little shit

Any time I catch it messing up it just lies and tries to hide it’s mistakes . This is the 1st model I’m caught doing this multiple times. I’m have llms hallucinate or be just completely wrong but qwen will say it did something, I call it out then it goes and double downs on its lie “I did do it like you asked “ and when I call it out it 1/2 admits to being wrong. It’s kinda funny how much it doesn’t want to admit it didn’t do what it was supposed to. submitted by /u/Cat5edope [link] [comments]

Reddit r/LocalLLaMA

1mabout 20 hours ago

ModelsLive

Running SmolLM2‑360M on a Samsung Galaxy Watch 4 (380MB RAM) – 74% RAM reduction in llama.cpp

I’ve got SmolLM2‑360M running on a Samsung Galaxy Watch 4 Classic (about 380MB free RAM) by tweaking llama.cpp and the underlying ggml memory model. By default, the model was being loaded twice in RAM: once via the APK’s mmap page cache and again via ggml’s tensor allocations, peaking at 524MB for a 270MB model. The fix: I pass host_ptr into llama_model_params , so CPU tensors point directly into the mmap region and only Vulkan tensors are copied. On real hardware this gives: Peak RAM: 524MB → 142MB (74% reduction) First boot: 19s → 11s Second boot: ~2.5s (mmap + KV cache warm) Code: https://github.com/Perinban/llama.cpp/tree/axon‑dev Longer write‑up with VmRSS traces and design notes: https://www.linkedin.com/posts/perinban-parameshwaran_machinelearning-llm-embeddedai-activity-74453741179

Reddit r/LocalLLaMA

1mabout 1 hour ago

Models

France's Mistral raises $830 million in debt for AI data centre build-up - Yahoo Finance

France's Mistral raises $830 million in debt for AI data centre build-up Yahoo Finance

Google News - Mistral AI France

1m3 days ago

ModelsRecent

Why Mistral's $830M raise is a win for European autonomy - ioplus.nl

Why Mistral's $830M raise is a win for European autonomy ioplus.nl

Google News - Mistral AI France

1mabout 23 hours ago

Early-fusion hybrid CNN-transformer models for multiclass ovarian tumor ultrasound classification - Frontiers

Daily AI Digest

More about

Anyone else notice qwen 3.5 is a lying little shit

APEX MoE quantized models boost with 33% faster inference and TurboQuant (14% of speedup in prompt processing)

Running SmolLM2‑360M on a Samsung Galaxy Watch 4 (380MB RAM) – 74% RAM reduction in llama.cpp

Knowledge Map

Connected Articles — Knowledge Graph

Discussion

More in Models

Anyone else notice qwen 3.5 is a lying little shit

Running SmolLM2‑360M on a Samsung Galaxy Watch 4 (380MB RAM) – 74% RAM reduction in llama.cpp

France's Mistral raises $830 million in debt for AI data centre build-up - Yahoo Finance

Why Mistral's $830M raise is a win for European autonomy - ioplus.nl