PPOpt-Llama-3.1-8B-Instruct-LoRA
A LoRA adapter for Llama-3.1-8B-Instruct fine-tuned for Prompt Optimization task.
Model Description
This model is trained to optimize user prompts based on their interaction history and preferences. Given a user's conversation history and current query, it rewrites the query into a clearer, more specific, and better-structured prompt.
Training Pipeline
- Stage 1: SFT (Supervised Fine-Tuning) - Trained on curated prompt optimization examples
- Stage 2: GRPO (Group Relative Policy Optimization) - Reinforcement learning with GPT-4o-mini as judge
LoRA Configuration
| Parameter | Value |
|---|---|
| r (rank) | 32 |
| lora_alpha | 32 |
| target_modules | all-linear |
| lora_dropout | 0 |
| bias | none |
Usage
Quick Start
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
# Load base model
base_model_id = "meta-llama/Llama-3.1-8B-Instruct"
model = AutoModelForCausalLM.from_pretrained(
base_model_id,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
# Load LoRA adapter
model = PeftModel.from_pretrained(model, "YOUR_USERNAME/ppopt-llama-3.1-8b-lora")
Merge LoRA (Optional)
If you want to merge the adapter into the base model:
merged_model = model.merge_and_unload()
merged_model.save_pretrained("merged_ppopt_llama8b")
tokenizer.save_pretrained("merged_ppopt_llama8b")
Intended Use
This model is designed for:
- Prompt optimization/rewriting systems
- Personalized query enhancement based on user history
- Research on prompt engineering automation
License
This model is released under the Apache 2.0 license.
- Downloads last month
- 7
Model tree for HowieHwong/ppopt
Base model
meta-llama/Llama-3.1-8B
Finetuned
meta-llama/Llama-3.1-8B-Instruct