File size: 9,173 Bytes

b8d82db

# 🚀 Production Deployment Guide - Background Execution

## Hướng dẫn chạy toàn bộ hệ thống trong background với 32 CPU mỗi project

---

## 📋 Tổng Quan

Chạy đồng thời 2 pipelines:
- **ASR Translation**: 32 CPU workers
- **Chat Translation**: 32 CPU workers

Tổng cộng: **64 CPU cores** được sử dụng

---

## ⚙️ Cấu Hình

### 1. Kiểm Tra Resources

```bash
# Kiểm tra số CPU cores
nproc
# Hoặc
lscpu | grep "^CPU(s):"

# Kiểm tra RAM available
free -h

# Khuyến nghị:
# - Tối thiểu: 64 CPU cores
# - RAM: 16GB+ (256MB per worker = 64 workers x 256MB = 16GB)
```

### 2. Kiểm Tra VLLM Server

```bash
# Check VLLM đang chạy
curl http://localhost:8000/v1/models

# Nếu không thấy, khởi động VLLM:
CUDA_VISIBLE_DEVICES=4,5,6,7 vllm serve Qwen/Qwen3-Next-80B-A3B-Instruct \
  --port 8000 \
  --tensor-parallel-size 4 \
  --max-model-len 32768 \
  --max-num-batched-tokens 131072 \
  --gpu-memory-utilization 0.9 &
```

---

## 🎯 Method 1: Script Tự Động (Khuyến Nghị)

### Quick Start

```bash
cd /home/dungvpt/workspace/mlm_training/synthetic_projects

# Chạy cả hai pipelines với 32 workers mỗi cái
bash scripts/run_production_full.sh
```

### Script sẽ:
1. ✅ Kiểm tra VLLM server
2. ✅ Tạo output directories với timestamp
3. ✅ Chạy ASR translation (32 workers) trong background
4. ✅ Chạy Chat translation (32 workers) trong background
5. ✅ Lưu logs riêng cho mỗi pipeline
6. ✅ Hiển thị process IDs để monitor
7. ✅ Tự động resume nếu bị interrupt

---

## 🎯 Method 2: Manual Commands

### ASR Translation (32 Workers)

```bash
cd /home/dungvpt/workspace/mlm_training/synthetic_projects

# Chạy trong background với nohup
nohup python -m src.asr_translation.runner \
    --input translation_for_asr/telephone2000h.txt \
    --output-dir outputs/asr_translation \
    --num-workers 32 \
    --batch-size 64 \
    --checkpoint-interval 1000 \
    --use-json \
    > logs/asr_production.log 2>&1 &

# Lưu process ID
echo $! > logs/asr_pid.txt
echo "ASR Translation PID: $(cat logs/asr_pid.txt)"
```

### Chat Translation (32 Workers)

```bash
cd /home/dungvpt/workspace/mlm_training/synthetic_projects

# Chạy trong background với nohup
nohup python -m src.chat_translation.runner \
    --dataset tarudesu/VOZ-HSD \
    --output-dir outputs/chat_translation \
    --num-workers 32 \
    --batch-size 64 \
    --checkpoint-interval 1000 \
    --use-json \
    > logs/chat_production.log 2>&1 &

# Lưu process ID
echo $! > logs/chat_pid.txt
echo "Chat Translation PID: $(cat logs/chat_pid.txt)"
```

---

## 📊 Monitoring

### Real-time Progress Monitoring

```bash
# Monitor ASR translation
tail -f logs/asr_production.log

# Monitor Chat translation
tail -f logs/chat_production.log

# Monitor cả hai cùng lúc (split terminal)
# Terminal 1:
tail -f logs/asr_production.log

# Terminal 2:
tail -f logs/chat_translation.log
```

### Check Progress

```bash
# Đếm số results đã xử lý
wc -l outputs/asr_translation/asr_run_*/results.jsonl
wc -l outputs/chat_translation/chat_run_*/results.jsonl

# Xem kết quả mới nhất
tail -n 5 outputs/asr_translation/asr_run_*/results.jsonl | jq .
tail -n 5 outputs/chat_translation/chat_run_*/results.jsonl | jq .

# Theo dõi realtime
watch -n 5 'wc -l outputs/*/*/results.jsonl'
```

### System Resources

```bash
# CPU usage
top -u $USER

# hoặc htop (more user-friendly)
htop

# Process status
ps aux | grep "python -m src"

# Specific processes
ps -p $(cat logs/asr_pid.txt) -o pid,cmd,%cpu,%mem,etime
ps -p $(cat logs/chat_pid.txt) -o pid,cmd,%cpu,%mem,etime
```

---

## 🛑 Control Operations

### Stop Processes

```bash
# Stop gracefully (saves checkpoint)
kill -SIGINT $(cat logs/asr_pid.txt)
kill -SIGINT $(cat logs/chat_pid.txt)

# hoặc dùng script
bash scripts/stop_production.sh

# Force stop (only if graceful doesn't work)
kill -9 $(cat logs/asr_pid.txt)
kill -9 $(cat logs/chat_pid.txt)
```

### Pause & Resume

```bash
# Pause (không tốn CPU nhưng giữ memory)
kill -STOP $(cat logs/asr_pid.txt)
kill -STOP $(cat logs/chat_pid.txt)

# Resume
kill -CONT $(cat logs/asr_pid.txt)
kill -CONT $(cat logs/chat_pid.txt)
```

### Restart (Auto-Resume)

```bash
# Simply run the same command again
# Resume feature sẽ tự động skip những items đã xử lý
bash scripts/run_production_full.sh
```

---

## 📈 Performance Tuning

### For High Throughput

```bash
# Tăng workers và batch size
NUM_WORKERS=48 \
BATCH_SIZE=96 \
bash scripts/run_production_full.sh
```

### For Memory-Constrained Systems

```bash
# Giảm workers và batch size
NUM_WORKERS=16 \
BATCH_SIZE=32 \
bash scripts/run_production_full.sh
```

### Optimal Settings (64 cores available)

```bash
# 32 workers per pipeline = 64 total
NUM_WORKERS=32 \
BATCH_SIZE=64 \
CHECKPOINT_INTERVAL=500 \
bash scripts/run_production_full.sh
```

---

## 📁 Output Structure

```
outputs/
├── asr_translation/
│   └── asr_run_20250128_100000/
│       ├── results.jsonl              # Incremental results
│       └── checkpoints/
│           ├── checkpoint_00001000.jsonl
│           ├── checkpoint_00002000.jsonl
│           └── ...
├── chat_translation/
│   └── chat_run_20250128_100000/
│       ├── results.jsonl
│       └── checkpoints/
│           ├── checkpoint_00001000.jsonl
│           └── ...
└── logs/
    ├── asr_production.log
    ├── chat_production.log
    ├── asr_pid.txt
    └── chat_pid.txt
```

---

## ✅ Validation

### While Running

```bash
# Validate ASR results (sample)
head -n 100 outputs/asr_translation/asr_run_*/results.jsonl > /tmp/asr_sample.jsonl
python scripts/validate_asr_output.py /tmp/asr_sample.jsonl

# Validate Chat results (sample)
head -n 100 outputs/chat_translation/chat_run_*/results.jsonl > /tmp/chat_sample.jsonl
python scripts/validate_chat_output.py /tmp/chat_sample.jsonl
```

### After Completion

```bash
# Full validation
python scripts/validate_asr_output.py outputs/asr_translation/asr_run_*/results.jsonl
python scripts/validate_chat_output.py outputs/chat_translation/chat_run_*/results.jsonl

# Calculate statistics
bash scripts/calculate_stats.sh outputs/asr_translation/asr_run_*/results.jsonl
bash scripts/calculate_stats.sh outputs/chat_translation/chat_run_*/results.jsonl
```

---

## 🔧 Troubleshooting

### Issue: Process died unexpectedly

```bash
# Check logs for errors
tail -n 50 logs/asr_production.log
tail -n 50 logs/chat_production.log

# Check if process still running
ps -p $(cat logs/asr_pid.txt)
ps -p $(cat logs/chat_pid.txt)

# Restart with resume
bash scripts/run_production_full.sh
```

### Issue: VLLM server overloaded

```bash
# Check VLLM GPU usage
nvidia-smi

# Reduce number of workers temporarily
NUM_WORKERS=16 bash scripts/run_production_full.sh
```

### Issue: Out of memory

```bash
# Check memory usage
free -h

# Reduce workers
NUM_WORKERS=16 BATCH_SIZE=32 bash scripts/run_production_full.sh
```

### Issue: Slow processing

```bash
# Check CPU usage (should be ~100% per worker)
top

# Check VLLM server response time
curl -w "@-" -o /dev/null -s http://localhost:8000/v1/models <<'EOF'
    time_namelookup:  %{time_namelookup}\n
       time_connect:  %{time_connect}\n
          time_total:  %{time_total}\n
EOF

# Check network latency if VLLM is remote
```

---

## 📊 Expected Performance

### With 32 Workers Each

| Metric | ASR Translation | Chat Translation |
|--------|----------------|------------------|
| Workers | 32 | 32 |
| Throughput | ~160-320 req/sec | ~160-320 req/sec |
| Time per item | ~0.1-0.2s | ~0.1-0.2s |
| Memory usage | ~8-10GB | ~8-10GB |

### Estimated Completion Time

```
ASR Translation:
- Total items: 1,647,738
- Throughput: 200 req/sec
- Estimated time: ~2.3 hours

Chat Translation:
- Total items: 10,747,733
- Throughput: 200 req/sec
- Estimated time: ~15 hours
```

---

## 🎯 Best Practices

1. **Monitor early**: Watch first 1000 items for any issues
2. **Check quality**: Validate samples periodically
3. **Resource balance**: Don't overload VLLM server
4. **Backup logs**: Keep logs for debugging
5. **Resume friendly**: Use default resume mode
6. **Checkpoint often**: Keep checkpoint interval reasonable

---

## 📞 Quick Reference Commands

```bash
# Start production
bash scripts/run_production_full.sh

# Monitor
tail -f logs/asr_production.log
tail -f logs/chat_production.log

# Check progress
watch -n 5 'wc -l outputs/*/*/results.jsonl'

# Stop gracefully
bash scripts/stop_production.sh

# Validate
python scripts/validate_asr_output.py outputs/asr_translation/asr_run_*/results.jsonl
python scripts/validate_chat_output.py outputs/chat_translation/chat_run_*/results.jsonl
```

---

## ✨ Summary

**Configuration**: 32 workers per pipeline = 64 total workers  
**Resume**: Automatic, enabled by default  
**Saving**: Incremental, real-time  
**Monitoring**: Live logs and progress tracking  
**Recovery**: Checkpoint-based, no data loss  

**Ready for production! 🚀**