Failing to use Qwen3.5-397B-A17B through HF inference

discuss.huggingface.coby cezarykupajacApril 1, 20261 min read1 views

Did something change about this model ? I used to have no issues running this model inside Zed Editor - but today for some reason I am getting error: {"status":400,"error":"BAD REQUEST","message":"payload validation: max_completion_tokens is limited to 16384 for qwen3.5-397b-a17b"} Even when I change the max_completion_tokens param to below that in Zed, it doesnt do anything - the error still happens. Anyone may have any idea whats going on? 1 post - 1 participant Read full topic

Could not retrieve the full article text.

Read on discuss.huggingface.co →

Original source

discuss.huggingface.co

https://discuss.huggingface.co/t/failing-to-use-qwen3-5-397b-a17b-through-hf-inference/174868

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

model

ModelsFresh

Abliterating Qwen3.5-397B on a Mac Studio revealed that MoE models encode refusal differently than dense models — safety refusals route through expert selection and survive weight-baking

Part of a series documenting building a fully local AI assistant on DGX Sparks + Mac Studio. I adapted FailSpy's abliteration technique for Qwen3.5-397B-A17B at 4-bit on a Mac Studio M3 Ultra (512GB). The goal was removing PRC censorship (Tiananmen, Taiwan, Uyghurs, Winnie the Pooh) from my personal assistant. Three findings I haven't seen documented anywhere: MoE models have two separable refusal subspaces. Chinese-political and Western-safety refusals are different directions in activation space. You can surgically remove one without touching the other. I removed PRC censorship while leaving drug/weapons refusals intact. Winnie the Pooh should not be a controversial topic on hardware I paid for. Weight-baking and inference hooking produce different results on MoE. On dense models, orthog

Reddit r/LocalLLaMA

2mabout 3 hours ago

AI ToolsLive

A beginner's guide to the Nano-Banana-2 model by Google on Replicate

This is a simplified guide to an AI model called Nano-Banana-2 maintained by Google . If you like these kinds of analysis, you should join AImodels.fyi or follow us on Twitter . Model overview nano-banana-2 is Google's fast image generation model built for speed and quality. It combines conversational editing capabilities with multi-image fusion and character consistency, making it a versatile tool for creative projects. Compared to nano-banana-pro , this version offers a balance between performance and resource efficiency. The model also supports real-time grounding through Google Web Search and Image Search, allowing it to generate images based on current events and visual references from the internet. Model inputs and outputs The model accepts text prompts along with optional reference

DEV Community

2mabout 1 hour ago

ProductsLive

Stop Prompting; Use the Design-Log Method to Build Predictable Tools

The article by Yoav Abrahami introduces the Design-Log Methodology, a structured approach to using AI in software development that combats the "context wall" — where AI models lose track of project history and make inconsistent decisions as codebases grow. The core idea is to maintain a version-controlled ./design-log/ folder in a Git repository, filled with markdown documents that capture design decisions, discussions, and implementation plans at the time they were made. This log acts as a shared brain between the developer and the AI, enabling the AI to act as a collaborative architect rather than just a code generator. By enforcing rules like read before you write, design before implementation, and immutable history, the methodology ensures consistency, reduces errors, and makes AI-assi

DEV Community

4mabout 1 hour ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 170 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Models

ModelsFresh

Abliterating Qwen3.5-397B on a Mac Studio revealed that MoE models encode refusal differently than dense models — safety refusals route through expert selection and survive weight-baking

Reddit r/LocalLLaMA

2mabout 3 hours ago

ModelsLive

[D] How to break free from LLM's chains as a PhD student?

I didn't realize but over a period of one year i have become overreliant on ChatGPT to write code, I am a second year PhD student and don't want to end up as someone with fake "coding skills" after I graduate. I hear people talk about it all the time that use LLM to write boring parts of the code, and write core stuff yourself, but the truth is, LLMs are getting better and better at even writing those parts if you write the prompt well (or at least give you a template that you can play around to cross the finish line). Even PhD advisors are well convinced that their students are using LLMs to assist in research work, and they mentally expect quicker results. I am currently trying to cope with imposter syndrome because my advisor is happy with my progress. But deep down I know that not 100%

Reddit r/MachineLearning

1mabout 1 hour ago

ModelsFresh

ciflow/vllm/179439

update vllm commit hash

PyTorch Releases

1mabout 3 hours ago

ModelsFresh

ciflow/trunk/179439

update vllm commit hash

PyTorch Releases

1mabout 3 hours ago