File size: 3,056 Bytes
7052cf8
 
 
 
 
17ae0b4
7052cf8
 
 
 
 
 
 
 
39a757e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0ce4a69
b93608e
 
 
 
 
 
 
0ce4a69
 
 
 
 
 
 
 
 
b93608e
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
---
title: Sentiment Model Comparison
emoji: πŸš€
colorFrom: pink
colorTo: indigo
sdk: streamlit
sdk_version: 5.37.0
app_file: app.py
pinned: false
license: mit
short_description: Compare sentiment predictions from two deep learning models
---

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

# πŸ“Š Sentiment Model Comparison App

This Streamlit app compares two sentiment classification models trained on IMDB movie reviews.

- Model A: 6M params, 50k vocab (fast & lightweight)
- Model B: 34M params, 256k vocab (high capacity)
- Ensemble: Average of both predictions

πŸ”— **Live Demo:** [Try it on Spaces](https://huggingface.co/spaces/Daksh0505/sentiment-model-comparison)

---

## πŸ” Features

- Enter single review text or upload a CSV (`review` column)
- Get predictions from both models + ensemble average
- Compare probabilities visually
- Submit feedback (saved to Google Sheets)


## 🧠 Models

### πŸ”Ή Model A
- Filename: `sentiment_model_imdb_6.6M.keras`  
- **Trainable Parameters**: ~6.6 million  
- **Total Parameters**: ~13.06 million  
- **Vocabulary Size**: 50,000 tokens  
- Description: Lightweight and efficient; optimized for speed.

### πŸ”Ή Model B
- Filename: `sentiment_model_imdb_34M.keras`  
- **Trainable Parameters**: ~34 million  
- **Total Parameters**: ~99.43 million  
- **Vocabulary Size**: 256,000 tokens  
- Description: Larger and more expressive; higher accuracy on nuanced reviews.

---

## πŸ—‚ Tokenizers

Each model uses its own tokenizer in Keras JSON format:

- `tokenizer_50k.json` β†’ used with Model A  
- `tokenizer_256k.json` β†’ used with Model B  

---

## πŸ”§ Load Models & Tokenizers (from Hugging Face Hub)

```python
from huggingface_hub import hf_hub_download
from tensorflow.keras.models import load_model
from tensorflow.keras.preprocessing.text import tokenizer_from_json
import json

# === Model A ===
model_path_a = hf_hub_download(repo_id="Daksh0505/sentiment-model-imdb", filename="sentiment_model_imdb_6.6M.keras")
tokenizer_path_a = hf_hub_download(repo_id="Daksh0505/sentiment-model-imdb", filename="tokenizer_50k.json")

with open(tokenizer_path_a, "r") as f:
    tokenizer_a = tokenizer_from_json(json.load(f))

model_a = load_model(model_path_a)

# === Model B ===
model_path_b = hf_hub_download(repo_id="Daksh0505/sentiment-model-imdb", filename="sentiment_model_imdb_34M.keras")
tokenizer_path_b = hf_hub_download(repo_id="Daksh0505/sentiment-model-imdb", filename="tokenizer_256k.json")

with open(tokenizer_path_b, "r") as f:
    tokenizer_b = tokenizer_from_json(json.load(f))

model_b = load_model(model_path_b)
```
---

## πŸ“ Dataset

- **Source:** [IMDB Multi-Movie Dataset](https://huggingface.co/datasets/Daksh0505/IMDB-Reviews)


## Citation (Please add if you use this dataset)
```ruby
@misc{imdb-multimovie-reviews,
  title = {IMDb Multi-Movie Review Dataset},
  author = {Daksh Bhardwaj},
  year = {2025},
  url = {https://huggingface.co/datasets/Daksh0505/IMDB-Reviews
  note = {Accessed: 2025-07-17}
}
```
---