Open-source large-scale models are key to the second half of AI's development; five major US-listed companies are jointly charting a blueprint for AI Agent development! - Moomoo
<a href="https://news.google.com/rss/articles/CBMiqgFBVV95cUxQa21keDRNN3JrLXdlbU9DOGU3T1B3Y1RZeS1IczQ5WkktQTRtQ1pJallPWjVGcHMwZGRUYm5LdXVnNHdDYmtwdWZkSGw0WDJTYTJWaHF6eC1jYVhYTml3Nm1NdHQyZzJfVTctSE5MRUtRNko0dUg3eDNQOVBUTzQ5Ukw1QlBadGM0SDVHNHBTU1REVXJXVnBCS29PN2tzTmpvSWFrMEVxekN0dw?oc=5" target="_blank">Open-source large-scale models are key to the second half of AI's development; five major US-listed companies are jointly charting a blueprint for AI Agent development!</a> <font color="#6f6f6f">Moomoo</font>
Could not retrieve the full article text.
Read on GNews AI open source →Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modelopen-sourceagent
Prompts you use to test/trip up your LLMs
I'm obsessed with finding prompts to test the quality of different local models. I've pretty much landed on several that I use across the board. Tell me about the Apple A6 (a pass is if it mentions Apple made their own microarchitecture called swift for the CPU cores, the main thing that the A6 is historically known for as the first Apple SOC to do it. This tests if it is smart enough to mention historically relevant information first) Tell me about the history of Phoenix's freeway network (A pass is if it gives a historical narration instead of just listing freeways. We asked for history, after all. Again, testing for its understanding of putting relevant information first.) Tell me about the Pentium D. Why was it a bad processor ( A pass is it it mentions that it glued two separate penti

TurboQuant on Apple Silicon: real benchmarks on Mac Mini M4 16GB and M3 Max 48GB
I’ve been testing TurboQuant this week on two machines and wanted to share the actual numbers. Why this matters: TurboQuant compresses the KV cache, not the model weights. On long contexts, KV cache can take several GB of memory, so reducing it can make a big difference even when throughput stays similar. In the setup I tested, K stays at q8_0 and V goes to turbo3 (~3-bit). That asymmetric tradeoff makes sense because errors in the keys affect attention routing more directly, while values often tolerate heavier compression better. Benchmark 1: Mac Mini M4 16GB — Qwen3-14B Q4_K_M at 8K context → Without TurboQuant: KV cache 1280 MiB, K (f16): 640 MiB, V (f16): 640 MiB — 9.95 t/s → With TurboQuant: KV cache 465 MiB, K (q8_0): 340 MiB, V (turbo3): 125 MiB — 9.25 t/s Almost 3x compression, wit

Abliterating Qwen3.5-397B on a Mac Studio revealed that MoE models encode refusal differently than dense models — safety refusals route through expert selection and survive weight-baking
Part of a series documenting building a fully local AI assistant on DGX Sparks + Mac Studio. I adapted FailSpy's abliteration technique for Qwen3.5-397B-A17B at 4-bit on a Mac Studio M3 Ultra (512GB). The goal was removing PRC censorship (Tiananmen, Taiwan, Uyghurs, Winnie the Pooh) from my personal assistant. Three findings I haven't seen documented anywhere: MoE models have two separable refusal subspaces. Chinese-political and Western-safety refusals are different directions in activation space. You can surgically remove one without touching the other. I removed PRC censorship while leaving drug/weapons refusals intact. Winnie the Pooh should not be a controversial topic on hardware I paid for. Weight-baking and inference hooking produce different results on MoE. On dense models, orthog
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models

Prompts you use to test/trip up your LLMs
I'm obsessed with finding prompts to test the quality of different local models. I've pretty much landed on several that I use across the board. Tell me about the Apple A6 (a pass is if it mentions Apple made their own microarchitecture called swift for the CPU cores, the main thing that the A6 is historically known for as the first Apple SOC to do it. This tests if it is smart enough to mention historically relevant information first) Tell me about the history of Phoenix's freeway network (A pass is if it gives a historical narration instead of just listing freeways. We asked for history, after all. Again, testing for its understanding of putting relevant information first.) Tell me about the Pentium D. Why was it a bad processor ( A pass is it it mentions that it glued two separate penti

Abliterating Qwen3.5-397B on a Mac Studio revealed that MoE models encode refusal differently than dense models — safety refusals route through expert selection and survive weight-baking
Part of a series documenting building a fully local AI assistant on DGX Sparks + Mac Studio. I adapted FailSpy's abliteration technique for Qwen3.5-397B-A17B at 4-bit on a Mac Studio M3 Ultra (512GB). The goal was removing PRC censorship (Tiananmen, Taiwan, Uyghurs, Winnie the Pooh) from my personal assistant. Three findings I haven't seen documented anywhere: MoE models have two separable refusal subspaces. Chinese-political and Western-safety refusals are different directions in activation space. You can surgically remove one without touching the other. I removed PRC censorship while leaving drug/weapons refusals intact. Winnie the Pooh should not be a controversial topic on hardware I paid for. Weight-baking and inference hooking produce different results on MoE. On dense models, orthog
![[D] How to break free from LLM's chains as a PhD student?](https://d2xsxph8kpxj0f.cloudfront.net/310419663032563854/konzwo8nGf8Z4uZsMefwMr/default-img-earth-satellite-QfbitDhCB2KjTsjtXRYcf9.webp)
[D] How to break free from LLM's chains as a PhD student?
I didn't realize but over a period of one year i have become overreliant on ChatGPT to write code, I am a second year PhD student and don't want to end up as someone with fake "coding skills" after I graduate. I hear people talk about it all the time that use LLM to write boring parts of the code, and write core stuff yourself, but the truth is, LLMs are getting better and better at even writing those parts if you write the prompt well (or at least give you a template that you can play around to cross the finish line). Even PhD advisors are well convinced that their students are using LLMs to assist in research work, and they mentally expect quicker results. I am currently trying to cope with imposter syndrome because my advisor is happy with my progress. But deep down I know that not 100%


Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!