ThaiLLM-30B info

This model is a continued pre-training from Qwen3-30B-A3B, which underwent training on a diverse corpus of approximately 63 billion tokens.

Important Note: This is a base model that requires instruction fine-tuning to align with specific user requirements and use cases.

Data

The training corpus consists of the following datasets:

Dataset Tokens
Fineweb2-ENG 24,000,000,000
Fineweb2-TH 31,525,674,209
CuratedData 8,054,246,789

CuratedData Breakdown

Category Token Count
Business & Finance 736,071,807
News 1,700,662,378
Education 554,889,778
Social 211,000,000
Government 40,492,117
Medical 42,987,587
Conversation 80,919,390
Code 620,218
Research Articles 4,185,649,758
Law 467,994,847
Travel 6,948,290
Buddhism 21,600,000
Others 4,410,619

*Token counts calculated using Qwen3 Tokenizer

Requirements

The code of Qwen3 has been integrated into the latest Hugging Face transformers library. We strongly recommend using the latest version of transformers.

With transformers<4.51.0, you will encounter the following error:

KeyError: 'qwen3'

Usage Training

Important: This is a base model and requires instruction fine-tuning before use to ensure optimal performance for your specific tasks and requirements.

Recommended Training Setup

We recommend using LLaMA-Factory for instruction fine-tuning. This framework provides an easy-to-use interface for training language models with various optimization techniques.

Quick Start with LLaMA-Factory

# Clone the repository
git clone https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory

# Install dependencies
pip install -e .

# Example training command for LoRA
llamafactory-cli train \
    --model_name_or_path ThaiLLM/ThaiLLM-30B \
    --stage sft \
    --do_train \
    --finetuning_type lora \
    --dataset your_dataset \
    --template qwen3 \
    --cutoff_len 8192 \
    --learning_rate 5e-05 \
    --num_train_epochs 3.0 \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 8 \
    --lr_scheduler_type cosine \
    --max_grad_norm 1.0 \
    --logging_steps 5 \
    --save_steps 100 \
    --warmup_steps 0 \
    --output_dir saves/ThaiLLM-30B-lora \
    --bf16

Usage Inference

Below are code snippets to get quickly started with running the model. First, install the necessary libraries.

pip install -U transformers torch accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM, 
import torch

model_id = "ThaiLLM/ThaiLLM-30B"

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id, 
    device_map="auto", 
    torch_dtype=torch.bfloat16
)

# Example prompt
prompt = "น้ำบริสุทธิ์มีค่า pH เท่าใด"
inputs = tokenizer(prompt, return_tensors="pt")

# Generate response
with torch.inference_mode(): 
    generate_ids = model.generate( 
        inputs.input_ids, 
        max_new_tokens=500, 
        repetition_penalty=1.2, 
        num_beams=1, 
        do_sample=True, 
        top_k=40, 
        top_p=0.75, 
        temperature=0.4, 
        pad_token_id=tokenizer.eos_token_id, 
    )

response = tokenizer.batch_decode(
    generate_ids, 
    skip_special_tokens=True, 
    clean_up_tokenization_spaces=True
)[0]

print(response)

Benchmarks

We evaluated ThaiLLM-30B against Qwen3-30B-Base using multiple-choice question datasets in both Thai and English.
Each benchmark measures the probability of selecting the correct choice based on the model’s next-token prediction.

Natural Language Understanding (NLU)

Task Qwen3-30B-Base ThaiLLM-30B (Qwen3-30B-A3B-cpt) Δ
Belebele (TH) 0.8704 0.8849 +0.0145
XNLI (TH) 0.7507 0.7363 -0.0144
ThaiExam (Overall) 0.5947 0.6478 +0.0531
├── A-Level 0.5276 0.6457 +0.1181
├── IC 0.6737 0.7158 +0.0421
├── ONET 0.5864 0.6296 +0.0432
├── TGAT 0.7538 0.7692 +0.0154
├── TPAT-1 0.5259 0.5517 +0.0258
M3Exam (Overall) 0.5452 0.5660 +0.0208
MMLU (ENG, 5-shot) 0.9600 0.9500 -0.0100
MMLU-Thai 0.7004 0.7284 +0.0280
XCOPA-Thai 0.8940 0.8760 -0.0180
M6Exam (Overall) 0.5869 0.6196 +0.0327
├── English 0.8846 0.8462 -0.0384
├── Math 0.5294 0.5294 0.0000
├── Science 0.6071 0.6786 +0.0715
├── Social 0.7091 0.7636 +0.0545
└── Thai 0.4980 0.5388 +0.0408
Model Average Score
Qwen3-30B-Base 0.7378
ThaiLLM-30B 0.7511

MMLU-ProX

Category Qwen3-30B-Base ThaiLLM-30B Δ
Biology 0.7294 0.7462 +0.0168
Business 0.4411 0.4499 +0.0088
Chemistry 0.4064 0.4046 -0.0018
Computer Science 0.5122 0.5220 +0.0098
Economics 0.6434 0.6339 -0.0095
Engineering 0.4943 0.4881 -0.0062
Health 0.4891 0.5226 +0.0335
History 0.4514 0.4488 -0.0026
Law 0.2982 0.2982 0.0000
Math 0.4537 0.4597 +0.0060
Other 0.3918 0.4232 +0.0314
Philosophy 0.3768 0.3627 -0.0141
Physics 0.4450 0.4442 -0.0008
Psychology 0.5952 0.6078 +0.0126
Model Overall
Qwen3-30B-Base 0.4739
ThaiLLM-30B 0.4797

Limitations

  • This is a base model and requires instruction fine-tuning for optimal performance
  • Performance on specialized domains may require domain-specific fine-tuning
  • As with all language models, outputs should be verified for accuracy in critical applications

Citation

@misc{qwen3technicalreport,
      title={Qwen3 Technical Report}, 
      author={Qwen Team},
      year={2025},
      eprint={2505.09388},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2505.09388}, 
}
Downloads last month
44
Safetensors
Model size
31B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support