Built with Axolotl

See axolotl config

axolotl version: 0.12.2

base_model: Qwen/Qwen2.5-VL-7B-Instruct
processor_type: AutoProcessor

# these 3 lines are needed for now to handle vision chat templates w images
skip_prepare_dataset: true
remove_unused_columns: false
sample_packing: false

chat_template: qwen2_vl
datasets:
  - path: e-zorzi/reasoning_distractors_choice_chat
    type: chat_template
    split: train

test_datasets:
  - path: e-zorzi/reasoning_distractors_choice_chat
    type: chat_template
    split: val_seen[:20%]
  - path: e-zorzi/reasoning_distractors_choice_chat
    type: chat_template
    split: val_unseen[:20%]


output_dir: ../ctex-persistent/outputs/qwen2_5_VL_7B_lora

load_in_8bit: True
adapter: lora
lora_model_dir:

sequence_len: 2048 #8192
pad_to_sequence_len: false

lora_r: 128
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules: 'model.language_model.layers.[\d]+.(mlp|cross_attn|self_attn).(up|down|gate|q|k|v|o)_proj'

wandb_project: axolotl_finetunes
wandb_entity: edo_vi
wandb_watch:
wandb_name: qwen_7B_2xH100
wandb_log_model:

gradient_accumulation_steps: 2
micro_batch_size: 32
num_epochs: 15
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.001

bf16: true
fp16:
tf32: true

gradient_checkpointing: true
logging_steps: 1
flash_attention: true
eager_attention:

warmup_steps: 60
evals_per_epoch: 2
saves_per_epoch: 1
save_strategy: epoch
weight_decay: 0.0

# save_first_step: true  # uncomment this to validate checkpoint saving works with your config

ctex-persistent/outputs/qwen2_5_VL_7B_lora

This model is a fine-tuned version of Qwen/Qwen2.5-VL-7B-Instruct on the e-zorzi/reasoning_distractors_choice_chat dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4372
  • Memory/max Mem Active(gib): 74.46
  • Memory/max Mem Allocated(gib): 74.46
  • Memory/device Mem Reserved(gib): 77.0

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 128
  • total_eval_batch_size: 64
  • optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 60
  • training_steps: 1193

Training results

Training Loss Epoch Step Validation Loss Mem Active(gib) Mem Allocated(gib) Mem Reserved(gib)
No log 0 0 4.5266 48.86 48.86 53.12
0.5513 0.5 40 0.2914 62.74 62.74 75.83
0.4364 1.0 80 0.2502 62.74 62.74 75.83
0.3686 1.5 120 0.2506 62.74 62.74 75.83
0.3452 2.0 160 0.2555 62.74 62.74 75.83
0.3286 2.5 200 0.2622 62.74 62.74 75.83
0.3142 3.0 240 0.2632 64.48 64.48 76.08
0.2946 3.5 280 0.2688 64.48 64.48 76.08
0.2891 4.0 320 0.2723 64.48 64.48 76.08
0.2693 4.5 360 0.2816 64.48 64.48 76.08
0.2411 5.0 400 0.2857 73.23 73.23 76.08
0.235 5.5 440 0.2948 73.23 73.23 76.36
0.2112 6.0 480 0.3009 73.23 73.23 76.36
0.2148 6.5 520 0.3072 73.23 73.23 76.36
0.1858 7.0 560 0.3127 73.23 73.23 76.36
0.1778 7.5 600 0.3228 73.23 73.23 76.36
0.1698 8.0 640 0.3324 73.23 73.23 76.36
0.1658 8.5 680 0.3403 73.23 73.23 76.36
0.1459 9.0 720 0.3483 73.23 73.23 76.36
0.1393 9.5 760 0.3610 73.23 73.23 76.36
0.1277 10.0 800 0.3613 73.23 73.23 76.36
0.127 10.5 840 0.3798 73.23 73.23 76.36
0.1157 11.0 880 0.3880 73.23 73.23 77.0
0.1149 11.5 920 0.3996 73.23 73.23 77.0
0.1094 12.0 960 0.4083 73.23 73.23 77.0
0.1089 12.5 1000 0.4180 73.23 73.23 77.0
0.1082 13.0 1040 0.4222 74.46 74.46 77.0
0.1069 13.5 1080 0.4343 74.46 74.46 77.0
0.1044 14.0 1120 0.4351 74.46 74.46 77.0
0.104 14.5 1160 0.4372 74.46 74.46 77.0

Framework versions

  • PEFT 0.17.0
  • Transformers 4.55.2
  • Pytorch 2.6.0+cu126
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for e-zorzi/Qwen2.5-VL-7B-Instruct-tuned-raw

Adapter
(167)
this model