Failing to use Qwen3.5-397B-A17B through HF inference
Did something change about this model ? I used to have no issues running this model inside Zed Editor - but today for some reason I am getting error: {"status":400,"error":"BAD REQUEST","message":"payload validation: max_completion_tokens is limited to 16384 for qwen3.5-397b-a17b"} Even when I change the max_completion_tokens param to below that in Zed, it doesnt do anything - the error still happens. Anyone may have any idea whats going on? 1 post - 1 participant Read full topic
Could not retrieve the full article text.
Read on discuss.huggingface.co →discuss.huggingface.co
https://discuss.huggingface.co/t/failing-to-use-qwen3-5-397b-a17b-through-hf-inference/174868Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
model
Abliterating Qwen3.5-397B on a Mac Studio revealed that MoE models encode refusal differently than dense models — safety refusals route through expert selection and survive weight-baking
Part of a series documenting building a fully local AI assistant on DGX Sparks + Mac Studio. I adapted FailSpy's abliteration technique for Qwen3.5-397B-A17B at 4-bit on a Mac Studio M3 Ultra (512GB). The goal was removing PRC censorship (Tiananmen, Taiwan, Uyghurs, Winnie the Pooh) from my personal assistant. Three findings I haven't seen documented anywhere: MoE models have two separable refusal subspaces. Chinese-political and Western-safety refusals are different directions in activation space. You can surgically remove one without touching the other. I removed PRC censorship while leaving drug/weapons refusals intact. Winnie the Pooh should not be a controversial topic on hardware I paid for. Weight-baking and inference hooking produce different results on MoE. On dense models, orthog

A beginner's guide to the Nano-Banana-2 model by Google on Replicate
This is a simplified guide to an AI model called Nano-Banana-2 maintained by Google . If you like these kinds of analysis, you should join AImodels.fyi or follow us on Twitter . Model overview nano-banana-2 is Google's fast image generation model built for speed and quality. It combines conversational editing capabilities with multi-image fusion and character consistency, making it a versatile tool for creative projects. Compared to nano-banana-pro , this version offers a balance between performance and resource efficiency. The model also supports real-time grounding through Google Web Search and Image Search, allowing it to generate images based on current events and visual references from the internet. Model inputs and outputs The model accepts text prompts along with optional reference

Stop Prompting; Use the Design-Log Method to Build Predictable Tools
The article by Yoav Abrahami introduces the Design-Log Methodology, a structured approach to using AI in software development that combats the "context wall" — where AI models lose track of project history and make inconsistent decisions as codebases grow. The core idea is to maintain a version-controlled ./design-log/ folder in a Git repository, filled with markdown documents that capture design decisions, discussions, and implementation plans at the time they were made. This log acts as a shared brain between the developer and the AI, enabling the AI to act as a collaborative architect rather than just a code generator. By enforcing rules like read before you write, design before implementation, and immutable history, the methodology ensures consistency, reduces errors, and makes AI-assi
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models

Abliterating Qwen3.5-397B on a Mac Studio revealed that MoE models encode refusal differently than dense models — safety refusals route through expert selection and survive weight-baking
Part of a series documenting building a fully local AI assistant on DGX Sparks + Mac Studio. I adapted FailSpy's abliteration technique for Qwen3.5-397B-A17B at 4-bit on a Mac Studio M3 Ultra (512GB). The goal was removing PRC censorship (Tiananmen, Taiwan, Uyghurs, Winnie the Pooh) from my personal assistant. Three findings I haven't seen documented anywhere: MoE models have two separable refusal subspaces. Chinese-political and Western-safety refusals are different directions in activation space. You can surgically remove one without touching the other. I removed PRC censorship while leaving drug/weapons refusals intact. Winnie the Pooh should not be a controversial topic on hardware I paid for. Weight-baking and inference hooking produce different results on MoE. On dense models, orthog
![[D] How to break free from LLM's chains as a PhD student?](https://d2xsxph8kpxj0f.cloudfront.net/310419663032563854/konzwo8nGf8Z4uZsMefwMr/default-img-earth-satellite-QfbitDhCB2KjTsjtXRYcf9.webp)
[D] How to break free from LLM's chains as a PhD student?
I didn't realize but over a period of one year i have become overreliant on ChatGPT to write code, I am a second year PhD student and don't want to end up as someone with fake "coding skills" after I graduate. I hear people talk about it all the time that use LLM to write boring parts of the code, and write core stuff yourself, but the truth is, LLMs are getting better and better at even writing those parts if you write the prompt well (or at least give you a template that you can play around to cross the finish line). Even PhD advisors are well convinced that their students are using LLMs to assist in research work, and they mentally expect quicker results. I am currently trying to cope with imposter syndrome because my advisor is happy with my progress. But deep down I know that not 100%

Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!