🚀 Production Deployment Guide - Background Execution
Hướng dẫn chạy toàn bộ hệ thống trong background với 32 CPU mỗi project
📋 Tổng Quan
Chạy đồng thời 2 pipelines:
- ASR Translation: 32 CPU workers
- Chat Translation: 32 CPU workers
Tổng cộng: 64 CPU cores được sử dụng
⚙️ Cấu Hình
1. Kiểm Tra Resources
# Kiểm tra số CPU cores
nproc
# Hoặc
lscpu | grep "^CPU(s):"
# Kiểm tra RAM available
free -h
# Khuyến nghị:
# - Tối thiểu: 64 CPU cores
# - RAM: 16GB+ (256MB per worker = 64 workers x 256MB = 16GB)
2. Kiểm Tra VLLM Server
# Check VLLM đang chạy
curl http://localhost:8000/v1/models
# Nếu không thấy, khởi động VLLM:
CUDA_VISIBLE_DEVICES=4,5,6,7 vllm serve Qwen/Qwen3-Next-80B-A3B-Instruct \
--port 8000 \
--tensor-parallel-size 4 \
--max-model-len 32768 \
--max-num-batched-tokens 131072 \
--gpu-memory-utilization 0.9 &
🎯 Method 1: Script Tự Động (Khuyến Nghị)
Quick Start
cd /home/dungvpt/workspace/mlm_training/synthetic_projects
# Chạy cả hai pipelines với 32 workers mỗi cái
bash scripts/run_production_full.sh
Script sẽ:
- ✅ Kiểm tra VLLM server
- ✅ Tạo output directories với timestamp
- ✅ Chạy ASR translation (32 workers) trong background
- ✅ Chạy Chat translation (32 workers) trong background
- ✅ Lưu logs riêng cho mỗi pipeline
- ✅ Hiển thị process IDs để monitor
- ✅ Tự động resume nếu bị interrupt
🎯 Method 2: Manual Commands
ASR Translation (32 Workers)
cd /home/dungvpt/workspace/mlm_training/synthetic_projects
# Chạy trong background với nohup
nohup python -m src.asr_translation.runner \
--input translation_for_asr/telephone2000h.txt \
--output-dir outputs/asr_translation \
--num-workers 32 \
--batch-size 64 \
--checkpoint-interval 1000 \
--use-json \
> logs/asr_production.log 2>&1 &
# Lưu process ID
echo $! > logs/asr_pid.txt
echo "ASR Translation PID: $(cat logs/asr_pid.txt)"
Chat Translation (32 Workers)
cd /home/dungvpt/workspace/mlm_training/synthetic_projects
# Chạy trong background với nohup
nohup python -m src.chat_translation.runner \
--dataset tarudesu/VOZ-HSD \
--output-dir outputs/chat_translation \
--num-workers 32 \
--batch-size 64 \
--checkpoint-interval 1000 \
--use-json \
> logs/chat_production.log 2>&1 &
# Lưu process ID
echo $! > logs/chat_pid.txt
echo "Chat Translation PID: $(cat logs/chat_pid.txt)"
📊 Monitoring
Real-time Progress Monitoring
# Monitor ASR translation
tail -f logs/asr_production.log
# Monitor Chat translation
tail -f logs/chat_production.log
# Monitor cả hai cùng lúc (split terminal)
# Terminal 1:
tail -f logs/asr_production.log
# Terminal 2:
tail -f logs/chat_translation.log
Check Progress
# Đếm số results đã xử lý
wc -l outputs/asr_translation/asr_run_*/results.jsonl
wc -l outputs/chat_translation/chat_run_*/results.jsonl
# Xem kết quả mới nhất
tail -n 5 outputs/asr_translation/asr_run_*/results.jsonl | jq .
tail -n 5 outputs/chat_translation/chat_run_*/results.jsonl | jq .
# Theo dõi realtime
watch -n 5 'wc -l outputs/*/*/results.jsonl'
System Resources
# CPU usage
top -u $USER
# hoặc htop (more user-friendly)
htop
# Process status
ps aux | grep "python -m src"
# Specific processes
ps -p $(cat logs/asr_pid.txt) -o pid,cmd,%cpu,%mem,etime
ps -p $(cat logs/chat_pid.txt) -o pid,cmd,%cpu,%mem,etime
🛑 Control Operations
Stop Processes
# Stop gracefully (saves checkpoint)
kill -SIGINT $(cat logs/asr_pid.txt)
kill -SIGINT $(cat logs/chat_pid.txt)
# hoặc dùng script
bash scripts/stop_production.sh
# Force stop (only if graceful doesn't work)
kill -9 $(cat logs/asr_pid.txt)
kill -9 $(cat logs/chat_pid.txt)
Pause & Resume
# Pause (không tốn CPU nhưng giữ memory)
kill -STOP $(cat logs/asr_pid.txt)
kill -STOP $(cat logs/chat_pid.txt)
# Resume
kill -CONT $(cat logs/asr_pid.txt)
kill -CONT $(cat logs/chat_pid.txt)
Restart (Auto-Resume)
# Simply run the same command again
# Resume feature sẽ tự động skip những items đã xử lý
bash scripts/run_production_full.sh
📈 Performance Tuning
For High Throughput
# Tăng workers và batch size
NUM_WORKERS=48 \
BATCH_SIZE=96 \
bash scripts/run_production_full.sh
For Memory-Constrained Systems
# Giảm workers và batch size
NUM_WORKERS=16 \
BATCH_SIZE=32 \
bash scripts/run_production_full.sh
Optimal Settings (64 cores available)
# 32 workers per pipeline = 64 total
NUM_WORKERS=32 \
BATCH_SIZE=64 \
CHECKPOINT_INTERVAL=500 \
bash scripts/run_production_full.sh
📁 Output Structure
outputs/
├── asr_translation/
│ └── asr_run_20250128_100000/
│ ├── results.jsonl # Incremental results
│ └── checkpoints/
│ ├── checkpoint_00001000.jsonl
│ ├── checkpoint_00002000.jsonl
│ └── ...
├── chat_translation/
│ └── chat_run_20250128_100000/
│ ├── results.jsonl
│ └── checkpoints/
│ ├── checkpoint_00001000.jsonl
│ └── ...
└── logs/
├── asr_production.log
├── chat_production.log
├── asr_pid.txt
└── chat_pid.txt
✅ Validation
While Running
# Validate ASR results (sample)
head -n 100 outputs/asr_translation/asr_run_*/results.jsonl > /tmp/asr_sample.jsonl
python scripts/validate_asr_output.py /tmp/asr_sample.jsonl
# Validate Chat results (sample)
head -n 100 outputs/chat_translation/chat_run_*/results.jsonl > /tmp/chat_sample.jsonl
python scripts/validate_chat_output.py /tmp/chat_sample.jsonl
After Completion
# Full validation
python scripts/validate_asr_output.py outputs/asr_translation/asr_run_*/results.jsonl
python scripts/validate_chat_output.py outputs/chat_translation/chat_run_*/results.jsonl
# Calculate statistics
bash scripts/calculate_stats.sh outputs/asr_translation/asr_run_*/results.jsonl
bash scripts/calculate_stats.sh outputs/chat_translation/chat_run_*/results.jsonl
🔧 Troubleshooting
Issue: Process died unexpectedly
# Check logs for errors
tail -n 50 logs/asr_production.log
tail -n 50 logs/chat_production.log
# Check if process still running
ps -p $(cat logs/asr_pid.txt)
ps -p $(cat logs/chat_pid.txt)
# Restart with resume
bash scripts/run_production_full.sh
Issue: VLLM server overloaded
# Check VLLM GPU usage
nvidia-smi
# Reduce number of workers temporarily
NUM_WORKERS=16 bash scripts/run_production_full.sh
Issue: Out of memory
# Check memory usage
free -h
# Reduce workers
NUM_WORKERS=16 BATCH_SIZE=32 bash scripts/run_production_full.sh
Issue: Slow processing
# Check CPU usage (should be ~100% per worker)
top
# Check VLLM server response time
curl -w "@-" -o /dev/null -s http://localhost:8000/v1/models <<'EOF'
time_namelookup: %{time_namelookup}\n
time_connect: %{time_connect}\n
time_total: %{time_total}\n
EOF
# Check network latency if VLLM is remote
📊 Expected Performance
With 32 Workers Each
| Metric | ASR Translation | Chat Translation |
|---|---|---|
| Workers | 32 | 32 |
| Throughput | ~160-320 req/sec | ~160-320 req/sec |
| Time per item | ~0.1-0.2s | ~0.1-0.2s |
| Memory usage | ~8-10GB | ~8-10GB |
Estimated Completion Time
ASR Translation:
- Total items: 1,647,738
- Throughput: 200 req/sec
- Estimated time: ~2.3 hours
Chat Translation:
- Total items: 10,747,733
- Throughput: 200 req/sec
- Estimated time: ~15 hours
🎯 Best Practices
- Monitor early: Watch first 1000 items for any issues
- Check quality: Validate samples periodically
- Resource balance: Don't overload VLLM server
- Backup logs: Keep logs for debugging
- Resume friendly: Use default resume mode
- Checkpoint often: Keep checkpoint interval reasonable
📞 Quick Reference Commands
# Start production
bash scripts/run_production_full.sh
# Monitor
tail -f logs/asr_production.log
tail -f logs/chat_production.log
# Check progress
watch -n 5 'wc -l outputs/*/*/results.jsonl'
# Stop gracefully
bash scripts/stop_production.sh
# Validate
python scripts/validate_asr_output.py outputs/asr_translation/asr_run_*/results.jsonl
python scripts/validate_chat_output.py outputs/chat_translation/chat_run_*/results.jsonl
✨ Summary
Configuration: 32 workers per pipeline = 64 total workers
Resume: Automatic, enabled by default
Saving: Incremental, real-time
Monitoring: Live logs and progress tracking
Recovery: Checkpoint-based, no data loss
Ready for production! 🚀