<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Llama.cpp on ZoliBen Csupra(Kabra)</title><link>https://zoliben.com/en/tags/llama.cpp/</link><description>Recent content in Llama.cpp on ZoliBen Csupra(Kabra)</description><generator>Hugo</generator><language>en</language><lastBuildDate>Thu, 23 Apr 2026 12:00:00 +0000</lastBuildDate><atom:link href="https://zoliben.com/en/tags/llama.cpp/index.xml" rel="self" type="application/rss+xml"/><item><title>Qwen 3.6: 35B vs 27B comparison - benchmark results</title><link>https://zoliben.com/en/posts/2026-04-23-qwen-36-35b-vs-27b-benchmark-results/</link><pubDate>Thu, 23 Apr 2026 12:00:00 +0000</pubDate><guid>https://zoliben.com/en/posts/2026-04-23-qwen-36-35b-vs-27b-benchmark-results/</guid><description>&lt;p>I finally summed up all the Qwen 3.6 model test results I gathered over the past few days. I compared two models in detail: the &lt;strong>Qwen3.6-35B-A3B&lt;/strong> (MoE, hybrid attention/delta) and the &lt;strong>Qwen3.6-27B&lt;/strong> (dense, hybrid attention/delta). I ran both with turbo3 KV cache compression on an RTX 4090 as a llama.cpp server.&lt;/p>
&lt;p>If I had to summarize briefly: the 35B-A3B is &lt;strong>3-4x faster&lt;/strong> in everything, but the 27B delivers &lt;strong>better quality&lt;/strong>. This is the classic MoE vs. dense tradeoff, just backed by numbers.&lt;/p></description></item></channel></rss>