--- base_model: - Qwen/Qwen3-8B - Qwen/Qwen3-8B-Base thumbnail: https://cdn-uploads.huggingface.co/production/uploads/633e85093a17ab61de8d9073/0z0yzAGcnw36qJ51eDl4Z.png library_name: transformers tags: - mergekit - merge --- # CavesOfQwen3-8b > Hey Hey, Model Gang, KaraWitch Here. > Have you, ever merged too deeply. > And found something 'they' don't want you to know? > "[CavesOfQwen3](https://youtu.be/o_PBfLbd3zw)", who is she? And why can't I reach her? ![image/png](https://cdn-uploads.huggingface.co/production/uploads/633e85093a17ab61de8d9073/0z0yzAGcnw36qJ51eDl4Z.png) CavesOfQwen3-8b is a merge between the base model and the instruct model of Qwen3-8B (i.e. Qwen3-8B) and it's base model Qwen3-8B-base. The idea for this merge is to remove the overbaked feeling that is in the instruct while retaining the instruct within the model. This is a merge of pre-trained language models created using ~~mergekit.~~ This model is done with mergekitty. With a couple of code patch to add qwen3 and a `o_proj` into Qwen3 arch configuration (else vllm get's very grumpy over it.) I used `TIES`. Not because I'm lazy but because it's what I had lying around that isn't `SCE` or something else. ## Model Results (Thanks to [@SmerkyG](/SmerkyG)) | Tasks |Version|Filter|n-shot|Metric| |Value | |Stderr| |---------------------------------------|------:|------|-----:|------|---|-----:|---|-----:| |mmlu | 2|none | |acc |↑ |0.7478|± |0.0034| | Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr| |--------------|------:|------|-----:|----------|---|-----:|---|-----:| |arc_challenge | 1|none | 0|acc |↑ |0.5392|± |0.0146| | | |none | 0|acc_norm |↑ |0.5768|± |0.0144| |arc_easy | 1|none | 0|acc |↑ |0.8178|± |0.0079| | | |none | 0|acc_norm |↑ |0.7963|± |0.0083| |hellaswag | 1|none | 0|acc |↑ |0.5906|± |0.0049| | | |none | 0|acc_norm |↑ |0.7868|± |0.0041| |lambada_openai| 1|none | 0|acc |↑ |0.7357|± |0.0061| | | |none | 0|perplexity|↓ |3.3203|± |0.0674| |piqa | 1|none | 0|acc |↑ |0.7933|± |0.0094| | | |none | 0|acc_norm |↑ |0.7922|± |0.0095| |sciq | 1|none | 0|acc |↑ |0.9630|± |0.0060| | | |none | 0|acc_norm |↑ |0.9570|± |0.0064| |winogrande | 1|none | 0|acc |↑ |0.7182|± |0.0126| ## Merge Details ### Merge Method This model was merged using the [TIES](https://arxiv.org/abs/2306.01708) merge method using [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) as a base. ### Models Merged The following models were included in the merge: * [Qwen/Qwen3-8B-Base](https://huggingface.co/Qwen/Qwen3-8B-Base) ### Configuration The following YAML configuration was used to produce this model: ```yaml models: - model: Qwen/Qwen3-8B parameters: density: 0.4 weight: 0.35 - model: Qwen/Qwen3-8B-Base parameters: density: 0.7 weight: 1 merge_method: ties base_model: Qwen/Qwen3-8B parameters: normalize: true dtype: bfloat16 ``` ### Disclaimer > CavesOfQwen3 and it's creator is not affiliated with Caves Of Qud or the creator of the video linked. > The reference is intentional, but it is supposed to be taken as a light hearted joke. > There's not need to take it too deeply other than "Haha, funni name." > This disclaimer is for those who think otherwise or are overthinkers.