|
|
--- |
|
|
license: cc-by-nc-4.0 |
|
|
--- |
|
|
|
|
|
|
|
|
# CPRetriever-Code |
|
|
|
|
|
**CPRetriever-Code** is a code embedding model trained via contrastive learning for **code-related retrieval tasks** in competitive programming. It achieves strong performance on tasks such as: |
|
|
|
|
|
* **Text-to-Code** retrieval (problem description β relevant code) |
|
|
* **Code-to-Code** retrieval (find alternate solutions to the same problem) |
|
|
|
|
|
This model is part of the [CPRet](https://github.com/coldchair/CPRet) suite for competitive programming retrieval research. |
|
|
|
|
|
## π§ Usage |
|
|
|
|
|
You can load this model using the `sentence-transformers` library: |
|
|
|
|
|
```python |
|
|
from sentence_transformers import SentenceTransformer |
|
|
|
|
|
model = SentenceTransformer("coldchair16/CPRetriever-Code") |
|
|
embeddings = model.encode([ |
|
|
"def mex_query(arr):\n n = len(arr)\n seen = set()\n for i in range(n):\n seen.add(arr[i])\n i = 0\n while True:\n if i not in seen:\n return i\n i += 1" |
|
|
]) |
|
|
``` |
|
|
|
|
|
## π‘ Applications |
|
|
|
|
|
This model is optimized for **code-level semantic retrieval** in competitive programming settings: |
|
|
|
|
|
* **Text-to-Code**: Retrieve relevant code snippets given a natural language problem description. |
|
|
* **Code-to-Code**: Retrieve alternative implementations of the same problem. |
|
|
|
|
|
It is particularly effective for analyzing programming contest submissions, searching solution variants, and building educational tools for code understanding. |
|
|
|
|
|
## π Training and Evaluation |
|
|
|
|
|
CPRetriever-Code is trained via **contrastive learning** using positive and hard negative code pairs derived from [CPRet-data](https://huggingface.co/datasets/coldchair16/CPRet-data). |
|
|
|
|
|
For the training pipeline, see the full project: |
|
|
π [CPRet on GitHub](https://github.com/coldchair/CPRet?tab=readme-ov-file) |
|
|
|
|
|
## π¦ Model Card |
|
|
|
|
|
* Architecture: `Salesforce/SFR-Embedding-Code-2B_R` (encoder backbone) |
|
|
* Training: Contrastive objective on code/code and text/code pairs |
|
|
* Format: Compatible with `sentence-transformers` |
|
|
|
|
|
|