Avoid Re-encoding Reference Images in Vision-LLM When Comparison Criteria Are User-Defined

discuss.huggingface.coby yaroslav332April 2, 20261 min read1 views

Hi everyone, I’m working with a Vision-LLM (like Qwen-VL / LLaVA / llama.cpp-based multimodal models) where I need to compare new images against reference images. The key part of my use case is that users define the comparison criteria (e.g., fur length, ear shape, color patterns), and I’m using image-to-text models to evaluate how well a new image matches a reference according to these criteria. Currently, every time I send a prompt including the reference images, the model re-encodes them from scratch . From the logs, I can see: llama-server encoding image slice... image slice encoded in 3800–4800 ms decoding image batch ... Even for the same reference images, this happens every single request , which makes inference slow. Questions: Has anyone dealt with user-defined comparison criteria

Could not retrieve the full article text.

Read on discuss.huggingface.co →

Original source

discuss.huggingface.co

https://discuss.huggingface.co/t/avoid-re-encoding-reference-images-in-vision-llm-when-comparison-criteria-are-user-defined/174897

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

llamamodelmultimodal

Releases

Chinese company Z.ai announces open-source image generation AI 'GLM-Image,' a hybrid of autoregressive and diffusion models - GIGAZINE

Chinese company Z.ai announces open-source image generation AI 'GLM-Image,' a hybrid of autoregressive and diffusion models GIGAZINE

GNews AI diffusion

1m3 months ago

ProductsLive

135,000 OpenClaw Users Just Got a 50x Price Hike. Anthropic Says It's 'Unsustainable.'

Originally published at news.skila.ai A single OpenClaw session can burn through $1,000 to $5,000 in compute. Anthropic was eating that cost on a $200/month Max plan. As of April 4, 2026 at 12pm PT, that arrangement is dead. More than 135,000 OpenClaw instances were running when Anthropic flipped the switch. Claude Pro ($20/month) and Max ($200/month) subscribers can no longer route their flat-rate plans through OpenClaw or any third-party agentic tool. The affected users now face cost increases of up to 50 times what they were paying. This is the biggest pricing disruption in the AI developer tool space since OpenAI killed free API access in 2023. And the ripple effects reach far beyond Anthropic's customer base. What Actually Happened (and Why) Boris Cherny, Head of Claude Code at Anthro

Dev.to AI

3mabout 1 hour ago

Open Source AILive

Gemma 4 Complete Guide: Architecture, Models, and Deployment in 2026

Google DeepMind released Gemma 4 on April 3, 2026 under Apache 2.0 — a significant licensing shift from previous Gemma releases that makes it genuinely usable for commercial products without legal ambiguity. This guide covers the full model family, architecture decisions worth understanding, and practical deployment paths across cloud, local, and mobile. The Four Models and When to Use Each Gemma 4 ships in four sizes with meaningfully different architectures: Model Params Active Architecture VRAM (4-bit) Target E2B ~2.3B all Dense + PLE ~2GB Mobile / edge E4B ~4.5B all Dense + PLE ~3.6GB Laptop / tablet 26B A4B 25.2B 3.8B MoE ~16GB Consumer GPU 31B 30.7B all Dense ~18GB Workstation The E2B result is the most surprising: multiple community benchmarks confirm it outperforms Gemma 3 27B on s

Dev.to AI

5mabout 1 hour ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 230 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Models

Models

Anthropic Races to Contain Leak of Code Behind Claude AI Agent - WSJ

Anthropic Races to Contain Leak of Code Behind Claude AI Agent WSJ

GNews AI coding

1m3 days ago

Models

Anthropic Races to Contain Leak of Code Behind Claude AI Agent - WSJ

Anthropic Races to Contain Leak of Code Behind Claude AI Agent WSJ

GNews AI copyright

1m3 days ago

ModelsLive

Один промпт заменил мне 3 часа дебага в день

Вечерами, когда большинство уже отдыхает, я зависаю в своём офисе и ковыряюсь с кодом. Тот 14 августа, в 21:45, не был исключением. Я опять сидел над этой задачей, которая съедала по три часа каждый день. Почему это была боль Всё началось с простого: проект на Python, который выглядел как очередное рутинное задание. Однако вычисления упорно выдавали ошибочные результаты. Три дня подряд я безуспешно искал причину. Как обычно, приходилось проверять каждую строчку, каждую переменную. Это было настоящим адом. Для фрилансера с жесткими сроками это катастрофа - теряешь время, не зарабатываешь, а заказчик ждёт. Я собрал промпты по этой теме в PDF. Забери бесплатно: https://t.me/airozov_bot Как я нашёл решение Тогда я решил попробовать ChatGPT, хотя и не особо верил в его чудеса. Вбил проблему в п

Dev.to AI

2mabout 1 hour ago

ModelsLive

Microsoft Is Going Multi-Model with Copilot. Does the Enterprise King Win Again? - AOL.com

Microsoft Is Going Multi-Model with Copilot. Does the Enterprise King Win Again? AOL.com

GNews AI Copilot

1mabout 2 hours ago