You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Built with Axolotl

See axolotl config

axolotl version: 0.16.0.dev0

# =============================================================================
# Weights and Biases logging config
# =============================================================================
wandb_project: Infracelestial
wandb_name: qlora-sft

# =============================================================================
# Model + Saving
# =============================================================================
base_model: Mawdistical/Kuwutu-7B
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
trust_remote_code: true
chat_template: chatml

output_dir: ./output
saves_per_epoch: 50
save_safetensors: true
save_total_limit: 5

# =============================================================================
# MIXED PRECISION
# =============================================================================
bf16: true
fp16: false
tf32: false

# =============================================================================
# MODEL LOADING — QLoRA (4-bit NF4)
# =============================================================================
load_in_8bit: false
load_in_4bit: true
strict: false

# =============================================================================
# SEQUENCE CONFIG
# =============================================================================
sequence_len: 8192
sample_packing: true
eval_sample_packing: false
pad_to_sequence_len: true
train_on_inputs: false
group_by_length: false

# =============================================================================
# QLoRA ADAPTER
# =============================================================================
adapter: qlora
lora_r: 64
lora_alpha: 32
lora_dropout: 0.05
peft_use_dora: false
lora_target_modules:
  - gate_proj
  - down_proj
  - up_proj
  - q_proj
  - v_proj
  - k_proj
  - o_proj

# =============================================================================
# DATASET CONFIGURATION
# =============================================================================
datasets:
  - path: data/data.jsonl
    type: chat_template
    field_messages: conversations
    message_property_mappings:
      role: from
      content: value
    roles:
      system:
        - system
      user:
        - human
      assistant:
        - gpt

shuffle_merged_datasets: true
dataset_prepared_path: ./dataset_prepared

# =============================================================================
# EVALUATION
# =============================================================================
val_set_size: 0.0

# =============================================================================
# TRAINING PARAMETERS
# =============================================================================
num_epochs: 2
micro_batch_size: 2
gradient_accumulation_steps: 4
warmup_steps: 500
optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 1e-5
loraplus_lr_ratio: 8
cosine_min_lr_ratio: 0.1
weight_decay: 0.1
max_grad_norm: 1
logging_steps: 10

# =============================================================================
# MEMORY OPTIMIZATION
# =============================================================================
gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false
flash_attention: true

# =============================================================================
# LOSS MONITORING
# =============================================================================
early_stopping_patience:
loss_watchdog_threshold: 100.0
loss_watchdog_patience: 3

# =============================================================================
# ADDITIONAL SETTINGS
# =============================================================================
debug: false
seed: 42

deepspeed: deepspeed_configs/zero2.json

output

This model is a fine-tuned version of Mawdistical/Kuwutu-7B on the data/data.jsonl dataset.

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 8
  • optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 500
  • training_steps: 22

Training results

Framework versions

  • PEFT 0.18.1
  • Transformers 5.5.3
  • Pytorch 2.8.0+cu128
  • Datasets 4.5.0
  • Tokenizers 0.22.2
Downloads last month
9
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Mawdistical-Brew/infracelestial-lite-LoRA