See axolotl config

axolotl version: 0.12.2

base_model: Qwen/Qwen2.5-VL-7B-Instruct
processor_type: AutoProcessor

# these 3 lines are needed for now to handle vision chat templates w images
skip_prepare_dataset: true
remove_unused_columns: false
sample_packing: false

chat_template: qwen2_vl
datasets:
  - path: e-zorzi/reasoning_distractors_choice_chat
    type: chat_template
    split: train

test_datasets:
  - path: e-zorzi/reasoning_distractors_choice_chat
    type: chat_template
    split: val_seen[:20%]
  - path: e-zorzi/reasoning_distractors_choice_chat
    type: chat_template
    split: val_unseen[:20%]


output_dir: ../ctex-persistent/outputs/qwen2_5_VL_7B_lora

load_in_8bit: True
adapter: lora
lora_model_dir:

sequence_len: 2048 #8192
pad_to_sequence_len: false

lora_r: 128
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules: 'model.language_model.layers.[\d]+.(mlp|cross_attn|self_attn).(up|down|gate|q|k|v|o)_proj'

wandb_project: axolotl_finetunes
wandb_entity: edo_vi
wandb_watch:
wandb_name: qwen_7B_2xH100
wandb_log_model:

gradient_accumulation_steps: 2
micro_batch_size: 32
num_epochs: 15
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.001

bf16: true
fp16:
tf32: true

gradient_checkpointing: true
logging_steps: 1
flash_attention: true
eager_attention:

warmup_steps: 60
evals_per_epoch: 2
saves_per_epoch: 1
save_strategy: epoch
weight_decay: 0.0

# save_first_step: true  # uncomment this to validate checkpoint saving works with your config

ctex-persistent/outputs/qwen2_5_VL_7B_lora

This model is a fine-tuned version of Qwen/Qwen2.5-VL-7B-Instruct on the e-zorzi/reasoning_distractors_choice_chat dataset. It achieves the following results on the evaluation set:

Loss: 0.4372
Memory/max Mem Active(gib): 74.46
Memory/max Mem Allocated(gib): 74.46
Memory/device Mem Reserved(gib): 77.0

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 32
eval_batch_size: 32
seed: 42
distributed_type: multi-GPU
num_devices: 2
gradient_accumulation_steps: 2
total_train_batch_size: 128
total_eval_batch_size: 64
optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 60
training_steps: 1193

Training results

Training Loss	Epoch	Step	Validation Loss	Mem Active(gib)	Mem Allocated(gib)	Mem Reserved(gib)
No log	0	0	4.5266	48.86	48.86	53.12
0.5513	0.5	40	0.2914	62.74	62.74	75.83
0.4364	1.0	80	0.2502	62.74	62.74	75.83
0.3686	1.5	120	0.2506	62.74	62.74	75.83
0.3452	2.0	160	0.2555	62.74	62.74	75.83
0.3286	2.5	200	0.2622	62.74	62.74	75.83
0.3142	3.0	240	0.2632	64.48	64.48	76.08
0.2946	3.5	280	0.2688	64.48	64.48	76.08
0.2891	4.0	320	0.2723	64.48	64.48	76.08
0.2693	4.5	360	0.2816	64.48	64.48	76.08
0.2411	5.0	400	0.2857	73.23	73.23	76.08
0.235	5.5	440	0.2948	73.23	73.23	76.36
0.2112	6.0	480	0.3009	73.23	73.23	76.36
0.2148	6.5	520	0.3072	73.23	73.23	76.36
0.1858	7.0	560	0.3127	73.23	73.23	76.36
0.1778	7.5	600	0.3228	73.23	73.23	76.36
0.1698	8.0	640	0.3324	73.23	73.23	76.36
0.1658	8.5	680	0.3403	73.23	73.23	76.36
0.1459	9.0	720	0.3483	73.23	73.23	76.36
0.1393	9.5	760	0.3610	73.23	73.23	76.36
0.1277	10.0	800	0.3613	73.23	73.23	76.36
0.127	10.5	840	0.3798	73.23	73.23	76.36
0.1157	11.0	880	0.3880	73.23	73.23	77.0
0.1149	11.5	920	0.3996	73.23	73.23	77.0
0.1094	12.0	960	0.4083	73.23	73.23	77.0
0.1089	12.5	1000	0.4180	73.23	73.23	77.0
0.1082	13.0	1040	0.4222	74.46	74.46	77.0
0.1069	13.5	1080	0.4343	74.46	74.46	77.0
0.1044	14.0	1120	0.4351	74.46	74.46	77.0
0.104	14.5	1160	0.4372	74.46	74.46	77.0

Framework versions

PEFT 0.17.0
Transformers 4.55.2
Pytorch 2.6.0+cu126
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: -

Model tree for e-zorzi/Qwen2.5-VL-7B-Instruct-tuned-raw

Base model

Qwen/Qwen2.5-VL-7B-Instruct

Adapter

(167)

this model