transformers

Running

Fixed typo in MQA explanation

by foldl - opened 14 days ago

←

Files changed (1) hide show

app/src/content/article.mdx CHANGED Viewed

@@ -532,7 +532,7 @@ The [GQA paper](https://arxiv.org/abs/2305.13245) explains how grouped-query att
 </Sidenote>
-NanoChat uses Multi-Query Attention (MQA) to reduce the memory footprint of the KV cache, using 6 query heads but only 6 key/value heads (in the default config). This is a common configuration for smaller models like nanochat.
 <Sidenote>

 </Sidenote>
+NanoChat uses Multi-Query Attention (MQA) to reduce the memory footprint of the KV cache, using 6 query heads but only 1 key/value head (in the default config). This is a common configuration for smaller models like nanochat.
 <Sidenote>