Llama.cpp

I finally summed up all the Qwen 3.6 model test results I gathered over the past few days. I compared two models in detail: the Qwen3.6-35B-A3B (MoE, hybrid attention/delta) and the Qwen3.6-27B (dense, hybrid attention/delta). I ran both with turbo3 KV cache compression on an RTX 4090 as a llama.cpp server.

If I had to summarize briefly: the 35B-A3B is 3-4x faster in everything, but the 27B delivers better quality. This is the classic MoE vs. dense tradeoff, just backed by numbers.

Posts for: #Llama.cpp

Qwen 3.6: 35B vs 27B comparison - benchmark results