Fixed typo in MQA explanation
#5
by
foldl
- opened
app/src/content/article.mdx
CHANGED
|
@@ -532,7 +532,7 @@ The [GQA paper](https://arxiv.org/abs/2305.13245) explains how grouped-query att
|
|
| 532 |
|
| 533 |
</Sidenote>
|
| 534 |
|
| 535 |
-
NanoChat uses Multi-Query Attention (MQA) to reduce the memory footprint of the KV cache, using 6 query heads but only
|
| 536 |
|
| 537 |
<Sidenote>
|
| 538 |
|
|
|
|
| 532 |
|
| 533 |
</Sidenote>
|
| 534 |
|
| 535 |
+
NanoChat uses Multi-Query Attention (MQA) to reduce the memory footprint of the KV cache, using 6 query heads but only 1 key/value head (in the default config). This is a common configuration for smaller models like nanochat.
|
| 536 |
|
| 537 |
<Sidenote>
|
| 538 |
|