This prototype was made public to reproduce an issue in llama.cpp
Trained in llamafactory.
The following hyperparameters were used during training:
learning_rate: 0.0001
train_batch_size: 1
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 8
optimizer: Use OptimizerNames.ADAFACTOR and the args are: No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 0.1
num_epochs: 1.0
PEFT 0.18.1
Transformers 5.2.0
Pytorch 2.10.0+cu128
Datasets 4.0.0
Tokenizers 0.22.2
Model tree for heiertech/GLM-4.7-Flash-Prototype-LoRa
Base model
zai-org/GLM-4.7-Flash