This prototype was made public to reproduce an issue in llama.cpp

Trained in llamafactory.

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 1
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 8
optimizer: Use OptimizerNames.ADAFACTOR and the args are: No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 0.1
num_epochs: 1.0
PEFT 0.18.1
Transformers 5.2.0
Pytorch 2.10.0+cu128
Datasets 4.0.0
Tokenizers 0.22.2

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for heiertech/GLM-4.7-Flash-Prototype-LoRa

Base model

Adapter

(4)

this model