You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

See axolotl config

axolotl version: 0.16.0.dev0

# =============================================================================
# Weights and Biases logging config
# =============================================================================
wandb_project: Infracelestial
wandb_name: qlora-sft

# =============================================================================
# Model + Saving
# =============================================================================
base_model: Mawdistical/Kuwutu-7B
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
trust_remote_code: true
chat_template: chatml

output_dir: ./output
saves_per_epoch: 50
save_safetensors: true
save_total_limit: 5

# =============================================================================
# MIXED PRECISION
# =============================================================================
bf16: true
fp16: false
tf32: false

# =============================================================================
# MODEL LOADING — QLoRA (4-bit NF4)
# =============================================================================
load_in_8bit: false
load_in_4bit: true
strict: false

# =============================================================================
# SEQUENCE CONFIG
# =============================================================================
sequence_len: 8192
sample_packing: true
eval_sample_packing: false
pad_to_sequence_len: true
train_on_inputs: false
group_by_length: false

# =============================================================================
# QLoRA ADAPTER
# =============================================================================
adapter: qlora
lora_r: 64
lora_alpha: 32
lora_dropout: 0.05
peft_use_dora: false
lora_target_modules:
  - gate_proj
  - down_proj
  - up_proj
  - q_proj
  - v_proj
  - k_proj
  - o_proj

# =============================================================================
# DATASET CONFIGURATION
# =============================================================================
datasets:
  - path: data/data.jsonl
    type: chat_template
    field_messages: conversations
    message_property_mappings:
      role: from
      content: value
    roles:
      system:
        - system
      user:
        - human
      assistant:
        - gpt

shuffle_merged_datasets: true
dataset_prepared_path: ./dataset_prepared

# =============================================================================
# EVALUATION
# =============================================================================
val_set_size: 0.0

# =============================================================================
# TRAINING PARAMETERS
# =============================================================================
num_epochs: 2
micro_batch_size: 2
gradient_accumulation_steps: 4
warmup_steps: 500
optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 1e-5
loraplus_lr_ratio: 8
cosine_min_lr_ratio: 0.1
weight_decay: 0.1
max_grad_norm: 1
logging_steps: 10

# =============================================================================
# MEMORY OPTIMIZATION
# =============================================================================
gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false
flash_attention: true

# =============================================================================
# LOSS MONITORING
# =============================================================================
early_stopping_patience:
loss_watchdog_threshold: 100.0
loss_watchdog_patience: 3

# =============================================================================
# ADDITIONAL SETTINGS
# =============================================================================
debug: false
seed: 42

deepspeed: deepspeed_configs/zero2.json

output

This model is a fine-tuned version of Mawdistical/Kuwutu-7B on the data/data.jsonl dataset.

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 2
eval_batch_size: 2
seed: 42
distributed_type: multi-GPU
gradient_accumulation_steps: 4
total_train_batch_size: 8
optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 500
training_steps: 22

Training results

Framework versions

PEFT 0.18.1
Transformers 5.5.3
Pytorch 2.8.0+cu128
Datasets 4.5.0
Tokenizers 0.22.2

Downloads last month: 9

Model tree for Mawdistical-Brew/infracelestial-lite-LoRA

Base model

XiaomiMiMo/MiMo-7B-Base

Finetuned

allura-org/Koto-Small-7B-PT

Finetuned

Aurore-Reveil/Koto-Small-7B-IT

Finetuned

Mawdistical/Kuwutu-7B

Adapter

(1)

this model