See axolotl config
axolotl version: 0.16.0.dev0
# =============================================================================
# Weights and Biases logging config
# =============================================================================
wandb_project: Infracelestial
wandb_name: qlora-sft
# =============================================================================
# Model + Saving
# =============================================================================
base_model: Mawdistical/Kuwutu-7B
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
trust_remote_code: true
chat_template: chatml
output_dir: ./output
saves_per_epoch: 50
save_safetensors: true
save_total_limit: 5
# =============================================================================
# MIXED PRECISION
# =============================================================================
bf16: true
fp16: false
tf32: false
# =============================================================================
# MODEL LOADING — QLoRA (4-bit NF4)
# =============================================================================
load_in_8bit: false
load_in_4bit: true
strict: false
# =============================================================================
# SEQUENCE CONFIG
# =============================================================================
sequence_len: 8192
sample_packing: true
eval_sample_packing: false
pad_to_sequence_len: true
train_on_inputs: false
group_by_length: false
# =============================================================================
# QLoRA ADAPTER
# =============================================================================
adapter: qlora
lora_r: 64
lora_alpha: 32
lora_dropout: 0.05
peft_use_dora: false
lora_target_modules:
- gate_proj
- down_proj
- up_proj
- q_proj
- v_proj
- k_proj
- o_proj
# =============================================================================
# DATASET CONFIGURATION
# =============================================================================
datasets:
- path: data/data.jsonl
type: chat_template
field_messages: conversations
message_property_mappings:
role: from
content: value
roles:
system:
- system
user:
- human
assistant:
- gpt
shuffle_merged_datasets: true
dataset_prepared_path: ./dataset_prepared
# =============================================================================
# EVALUATION
# =============================================================================
val_set_size: 0.0
# =============================================================================
# TRAINING PARAMETERS
# =============================================================================
num_epochs: 2
micro_batch_size: 2
gradient_accumulation_steps: 4
warmup_steps: 500
optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 1e-5
loraplus_lr_ratio: 8
cosine_min_lr_ratio: 0.1
weight_decay: 0.1
max_grad_norm: 1
logging_steps: 10
# =============================================================================
# MEMORY OPTIMIZATION
# =============================================================================
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: false
flash_attention: true
# =============================================================================
# LOSS MONITORING
# =============================================================================
early_stopping_patience:
loss_watchdog_threshold: 100.0
loss_watchdog_patience: 3
# =============================================================================
# ADDITIONAL SETTINGS
# =============================================================================
debug: false
seed: 42
deepspeed: deepspeed_configs/zero2.json
output
This model is a fine-tuned version of Mawdistical/Kuwutu-7B on the data/data.jsonl dataset.
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 4
- total_train_batch_size: 8
- optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 500
- training_steps: 22
Training results
Framework versions
- PEFT 0.18.1
- Transformers 5.5.3
- Pytorch 2.8.0+cu128
- Datasets 4.5.0
- Tokenizers 0.22.2
- Downloads last month
- 9
Model tree for Mawdistical-Brew/infracelestial-lite-LoRA
Base model
XiaomiMiMo/MiMo-7B-Base Finetuned
allura-org/Koto-Small-7B-PT Finetuned
Aurore-Reveil/Koto-Small-7B-IT Finetuned
Mawdistical/Kuwutu-7B