This prototype was made public to reproduce an issue in llama.cpp

Trained in llamafactory.

The following hyperparameters were used during training:

  • learning_rate: 0.0001

  • train_batch_size: 1

  • eval_batch_size: 8

  • seed: 42

  • gradient_accumulation_steps: 8

  • total_train_batch_size: 8

  • optimizer: Use OptimizerNames.ADAFACTOR and the args are: No additional optimizer arguments

  • lr_scheduler_type: cosine

  • lr_scheduler_warmup_steps: 0.1

  • num_epochs: 1.0

  • PEFT 0.18.1

  • Transformers 5.2.0

  • Pytorch 2.10.0+cu128

  • Datasets 4.0.0

  • Tokenizers 0.22.2

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for heiertech/GLM-4.7-Flash-Prototype-LoRa

Adapter
(4)
this model