Improve model card: Add project page, abstract, key results, and comprehensive tags
#2
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -1,9 +1,67 @@
|
|
| 1 |
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
library_name: transformers
|
|
|
|
| 4 |
pipeline_tag: text-classification
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
---
|
| 6 |
|
| 7 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
|
| 9 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
|
|
|
| 2 |
library_name: transformers
|
| 3 |
+
license: apache-2.0
|
| 4 |
pipeline_tag: text-classification
|
| 5 |
+
tags:
|
| 6 |
+
- text-generation
|
| 7 |
+
- interpretable-ai
|
| 8 |
+
- concept-bottleneck
|
| 9 |
+
- llm
|
| 10 |
---
|
| 11 |
|
| 12 |
+
# Concept Bottleneck Large Language Models
|
| 13 |
+
|
| 14 |
+
This repository contains the model described in the paper [Concept Bottleneck Large Language Models](https://huggingface.co/papers/2412.07992), accepted by ICLR 2025.
|
| 15 |
+
|
| 16 |
+
- **Paper:** [Concept Bottleneck Large Language Models](https://huggingface.co/papers/2412.07992)
|
| 17 |
+
- **Project Page:** [https://lilywenglab.github.io/CB-LLMs/](https://lilywenglab.github.io/CB-LLMs/)
|
| 18 |
+
- **Code:** [https://github.com/Trustworthy-ML-Lab/CB-LLMs](https://github.com/Trustworthy-ML-Lab/CB-LLMs)
|
| 19 |
+
|
| 20 |
+
## Abstract
|
| 21 |
+
We introduce Concept Bottleneck Large Language Models (CB-LLMs), a novel framework for building inherently interpretable Large Language Models (LLMs). In contrast to traditional black-box LLMs that rely on limited post-hoc interpretations, CB-LLMs integrate intrinsic interpretability directly into the LLMs -- allowing accurate explanations with scalability and transparency. We build CB-LLMs for two essential NLP tasks: text classification and text generation. In text classification, CB-LLMs is competitive with, and at times outperforms, traditional black-box models while providing explicit and interpretable reasoning. For the more challenging task of text generation, interpretable neurons in CB-LLMs enable precise concept detection, controlled generation, and safer outputs. The embedded interpretability empowers users to transparently identify harmful content, steer model behavior, and unlearn undesired concepts -- significantly enhancing the safety, reliability, and trustworthiness of LLMs, which are critical capabilities notably absent in existing models.
|
| 22 |
+
|
| 23 |
+
## Usage
|
| 24 |
+
|
| 25 |
+
For detailed installation instructions, training procedures, and various usage examples (including how to test concept detection, steerability, and generate sentences), please refer to the [official GitHub repository](https://github.com/Trustworthy-ML-Lab/CB-LLMs).
|
| 26 |
+
|
| 27 |
+
## Key Results
|
| 28 |
+
|
| 29 |
+
### Part I: CB-LLM (classification)
|
| 30 |
+
CB-LLMs are competitive with the black-box model after applying Automatic Concept Correction (ACC).
|
| 31 |
+
|
| 32 |
+
| Accuracy ↑ | SST2 | YelpP | AGnews | DBpedia |
|
| 33 |
+
|-----------------------|--------|---------|---------|----------|
|
| 34 |
+
| **Ours:** | | | | |\
|
| 35 |
+
| CB-LLM | 0.9012 | 0.9312 | 0.9009 | 0.9831 |\
|
| 36 |
+
| CB-LLM w/ ACC | **0.9407** | **<span style="color:blue">0.9806</span>** | **0.9453** | **<span style="color:blue">0.9928</span>** |\
|
| 37 |
+
| **Baselines:** | | | | |\
|
| 38 |
+
| TBM&C³M | 0.9270 | 0.9534 | 0.8972 | 0.9843 |\
|
| 39 |
+
| Roberta-base fine-tuned (black-box) | 0.9462 | 0.9778 | 0.9508 | 0.9917 |
|
| 40 |
+
|
| 41 |
+
### Part II: CB-LLM (generation)
|
| 42 |
+
The accuracy, steerability, and perplexity of CB-LLMs (generation). CB-LLMs perform well on accuracy (↑) and perplexity (↓) while providing higher steerability (↑).
|
| 43 |
+
|
| 44 |
+
| Method | Metric | SST2 | YelpP | AGnews | DBpedia |
|
| 45 |
+
|---------------------------------|------------------|---------|--------|---------|---------|\
|
| 46 |
+
| **CB-LLM (Ours)** | Accuracy↑ | 0.9638 | **0.9855** | 0.9439 | 0.9924 |\
|
| 47 |
+
| | Steerability↑ | **0.82** | **0.95** | **0.85** | **0.76** |\
|
| 48 |
+
| | Perplexity↓ | 116.22 | 13.03 | 18.25 | 37.59 |\
|
| 49 |
+
| **CB-LLM w/o ADV training** | Accuracy↑ | 0.9676 | 0.9830 | 0.9418 | **0.9934** |\
|
| 50 |
+
| | Steerability↑ | 0.57 | 0.69 | 0.52 | 0.21 |\
|
| 51 |
+
| | Perplexity↓ | **59.19** | 12.39 | 17.93 | **35.13** |\
|
| 52 |
+
| **Llama3 finetuned (black-box)**| Accuracy↑ | **0.9692** | 0.9851 | **0.9493** | 0.9919 |\
|
| 53 |
+
| | Steerability↑ | No | No | No | No |\
|
| 54 |
+
| | Perplexity↓ | 84.70 | **6.62** | **12.52** | 41.50 |
|
| 55 |
+
|
| 56 |
+
## Citation
|
| 57 |
+
|
| 58 |
+
If you find this work useful, please cite the paper:
|
| 59 |
|
| 60 |
+
```bibtex
|
| 61 |
+
@article{cbllm,
|
| 62 |
+
title={Concept Bottleneck Large Language Models},
|
| 63 |
+
author={Sun, Chung-En and Oikarinen, Tuomas and Ustun, Berk and Weng, Tsui-Wei},
|
| 64 |
+
journal={ICLR},
|
| 65 |
+
year={2025}
|
| 66 |
+
}
|
| 67 |
+
```
|