Model Card for Model ID
This is a IDEFICS 9B model trained with ppo on the frozenlake env.
Model Details
Trainer Hyperparameters
suppress_warnings: True
debug: True
seed: 9812
reseed_env: True
torch_deterministic: True
track: True
wandb_project_name: "frozenlake_idefics"
wandb_entity: null #'rl-team-unito'
wandb_log_dir: "${now:%Y-%m-%d_%H-%M-%S}"
save_video: True
save_video_every: 20
save_stats: True
save_episode: False
env_size: 244
env_area: 8
num_prompt_images: 1
use_text_description: True
Algorithm specific arguments
model: "HuggingFaceM4/idefics-9b-instruct"
model_ckpt: null
lora_adapter_path: null
is_slippery: False
fixed_orientation: True
no_step_description: False
first_person: True
fov: 1
total_timesteps: 400000
disable_training: False
from_accelerate_savestate_to_checkpoint: False
learning_rate: 1e-5
critic_learning_rate: 1e-5
local_num_envs: 4
num_steps: 128
anneal_lr: False
gamma: 0.99
gae_lambda: 0.95
num_minibatches: 128
update_epochs: 1
norm_adv: True
clip_coef: 0.1
clip_vloss: True
ent_coef: 0.01 #0.01
vf_coef: 0.5
max_grad_norm: 0.5
target_kl: null
save_every: 50
gradient_accumulation: 4
adam_epsilon: 1e-8
gradient_ckpt: False
lora: True
temperature: 'max_logit'
disable_adapters_for_generation: True
normalization_by_words: False
action_logits_from_whole_seq: True
advanced_action_matching: False
env_id: "FrozenLakeText-v0" # MiniGrid-LavaGapS7-v0
generate_actions: False
value_prompt_template: "I am the agent in this minigrid world. {} Avoid the traps!\nWhat's the next best action?"
action_template: " Based on the information provided, the next best action would be to {}"
possible_actions_list: "forward pickup toggle opt_left opt_right opt_back"
Model Sources [optional]
- Repository: [More Information Needed]
- Paper [optional]: [More Information Needed]
- Demo [optional]: [More Information Needed]
Uses
Direct Use
[More Information Needed]
Downstream Use [optional]
[More Information Needed]
Out-of-Scope Use
[More Information Needed]
Bias, Risks, and Limitations
[More Information Needed]
Recommendations
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
How to Get Started with the Model
Use the code below to get started with the model.
[More Information Needed]
Training Details
Training Data
[More Information Needed]
Training Procedure
Preprocessing [optional]
[More Information Needed]
Training Hyperparameters
- Training regime: [More Information Needed]
Speeds, Sizes, Times [optional]
[More Information Needed]
Evaluation
Testing Data, Factors & Metrics
Testing Data
[More Information Needed]
Factors
[More Information Needed]
Metrics
[More Information Needed]
Results
[More Information Needed]
Summary
Model Examination [optional]
[More Information Needed]
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
- Hardware Type: [More Information Needed]
- Hours used: [More Information Needed]
- Cloud Provider: [More Information Needed]
- Compute Region: [More Information Needed]
- Carbon Emitted: [More Information Needed]
Technical Specifications [optional]
Model Architecture and Objective
[More Information Needed]
Compute Infrastructure
[More Information Needed]
Hardware
[More Information Needed]
Software
[More Information Needed]
Citation [optional]
BibTeX:
[More Information Needed]
APA:
[More Information Needed]
Glossary [optional]
[More Information Needed]
More Information [optional]
[More Information Needed]
Model Card Authors [optional]
[More Information Needed]
Model Card Contact
[More Information Needed]
Framework versions