# Multi-Agent Video Generation System - Architecture Overview
## 🎯 System Purpose
This is a sophisticated **multi-agent system** that automatically generates educational videos using Manim (Mathematical Animation Engine). The system transforms textual descriptions of mathematical concepts, theorems, and educational content into high-quality animated videos through coordinated AI agents.
## 🏗️ System Architecture
```mermaid
flowchart TD
%% Input Layer
U["User Input
(Topic & Context)"]:::input
GV["generate_video.py
(Main Orchestrator)"]:::input
ES["evaluate.py
(Quality Assessment)"]:::input
%% Configuration and Data
CONF["Configuration
(.env, src/config)"]:::config
DATA["Data Repository
(data/)"]:::data
%% Core Generation Pipeline
subgraph "Core Multi-Agent Pipeline"
CG["Code Generation Agent
(src/core/code_generator.py)"]:::core
VP["Video Planning Agent
(src/core/video_planner.py)"]:::core
VR["Video Rendering Agent
(src/core/video_renderer.py)"]:::core
end
%% Retrieval & Augmentation (RAG)
RAG["RAG Intelligence Agent
(src/rag/rag_integration.py,
src/rag/vector_store.py)"]:::rag
%% Task & Prompt Generation
TASK["Task & Prompt Generation
(task_generator/)"]:::task
%% External LLM & Model Tools
LLM["LLM Provider Agents
(mllm_tools/)"]:::ai
%% Voiceover & Utilities
VOX["Utility Services
(src/utils/)"]:::voice
%% Evaluation Module
EVAL["Quality Evaluation Agent
(eval_suite/)"]:::eval
%% Connections
U -->|"provides data"| GV
GV -->|"reads configuration"| CONF
CONF -->|"configures processing"| CG
CONF -->|"fetches theorem data"| DATA
%% Core Pipeline Flow
GV -->|"orchestrates generation"| CG
CG -->|"sends code/instructions"| VP
VP -->|"plans scenes"| VR
VR -->|"integrates audio"| VOX
VOX -->|"produces final video"| EVAL
%% Cross Module Integrations
TASK -->|"supplies prompt templates"| CG
TASK -->|"guides scene planning"| VP
CG -->|"augments with retrieval"| RAG
VP -->|"queries documentation"| RAG
LLM -->|"supports AI generation"| CG
LLM -->|"supports task generation"| TASK
%% Evaluation Script
ES -->|"evaluates output"| EVAL
%% Styles
classDef input fill:#FFD580,stroke:#333,stroke-width:2px;
classDef config fill:#B3E5FC,stroke:#333,stroke-width:2px;
classDef data fill:#C8E6C9,stroke:#333,stroke-width:2px;
classDef core fill:#FFF59D,stroke:#333,stroke-width:2px;
classDef rag fill:#FFCC80,stroke:#333,stroke-width:2px;
classDef task fill:#D1C4E9,stroke:#333,stroke-width:2px;
classDef ai fill:#B2EBF2,stroke:#333,stroke-width:2px;
classDef voice fill:#FFE0B2,stroke:#333,stroke-width:2px;
classDef eval fill:#E1BEE7,stroke:#333,stroke-width:2px;
```
## 🤖 Core Agents & Responsibilities
### 1. **🎬 Video Planning Agent** (`src/core/video_planner.py`)
**Role**: Strategic planning and scene orchestration
**Key Capabilities**:
- Scene outline generation and decomposition
- Storyboard creation with visual descriptions
- Technical implementation planning
- Concurrent scene processing with enhanced parallelization
- Context learning from previous examples
- RAG integration for Manim documentation retrieval
**Key Methods**:
- `generate_scene_outline()` - Creates overall video structure
- `generate_scene_implementation_concurrently_enhanced()` - Parallel scene planning
- `_initialize_context_examples()` - Loads learning contexts
### 2. **⚡ Code Generation Agent** (`src/core/code_generator.py`)
**Role**: Manim code synthesis and optimization
**Key Capabilities**:
- Intelligent Manim code generation from scene descriptions
- Automatic error detection and fixing
- Visual self-reflection for code quality
- RAG-enhanced code generation with documentation context
- Context learning from successful examples
- Banned reasoning prevention
**Key Methods**:
- `generate_manim_code()` - Primary code generation
- `fix_code_errors()` - Intelligent error correction
- `visual_self_reflection()` - Quality validation
### 3. **🎞️ Video Rendering Agent** (`src/core/video_renderer.py`)
**Role**: Video compilation and optimization
**Key Capabilities**:
- Optimized Manim scene rendering
- Intelligent caching system for performance
- Parallel scene processing
- Quality preset management (preview/low/medium/high/production)
- GPU acceleration support
- Video combination and assembly
**Key Methods**:
- `render_scene_optimized()` - Enhanced scene rendering
- `combine_videos_optimized()` - Final video assembly
- `_get_code_hash()` - Intelligent caching
### 4. **🔍 RAG Intelligence Agent** (`src/rag/rag_integration.py`, `src/rag/vector_store.py`)
**Role**: Knowledge retrieval and context augmentation
**Key Capabilities**:
- Manim documentation retrieval
- Plugin detection and relevance scoring
- Vector store management with ChromaDB
- Query generation for technical contexts
- Enhanced document embedding and retrieval
**Key Methods**:
- `detect_relevant_plugins()` - Smart plugin identification
- `retrieve_relevant_docs()` - Context-aware documentation retrieval
- `generate_rag_queries()` - Intelligent query formulation
### 5. **📝 Task & Prompt Generation Service** (`task_generator/`)
**Role**: Template management and prompt engineering
**Key Capabilities**:
- Dynamic prompt template generation
- Context-aware prompt customization
- Banned reasoning pattern management
- Multi-modal prompt support
**Key Components**:
- `parse_prompt.py` - Template processing
- `prompts_raw/` - Prompt template repository
### 6. **🤖 LLM Provider Agents** (`mllm_tools/`)
**Role**: AI model abstraction and management
**Key Capabilities**:
- Multi-provider LLM support (OpenAI, Gemini, Vertex AI, OpenRouter)
- Unified interface for different AI models
- Cost tracking and usage monitoring
- Langfuse integration for observability
**Key Components**:
- `litellm.py` - LiteLLM wrapper for multiple providers
- `openrouter.py` - OpenRouter integration
- `gemini.py` - Google Gemini integration
- `vertex_ai.py` - Google Cloud Vertex AI
### 7. **✅ Quality Evaluation Agent** (`eval_suite/`)
**Role**: Output validation and quality assurance
**Key Capabilities**:
- Multi-modal content evaluation (text, image, video)
- Automated quality scoring
- Error pattern detection
- Performance metrics collection
**Key Components**:
- `text_utils.py` - Text quality evaluation
- `image_utils.py` - Visual content assessment
- `video_utils.py` - Video quality metrics
## 🔄 Multi-Agent Workflow
### **Phase 1: Initialization & Planning**
1. **System Orchestrator** (`generate_video.py`) receives user input
2. **Configuration Manager** loads system settings and model configurations
3. **Session Manager** creates/loads session for continuity
4. **Video Planning Agent** analyzes topic and creates scene breakdown
5. **RAG Agent** detects relevant plugins and retrieves documentation
### **Phase 2: Implementation Planning**
1. **Video Planning Agent** generates detailed implementation plans for each scene
2. **Task Generator** provides appropriate prompt templates
3. **RAG Agent** augments plans with relevant technical documentation
4. **Scene Analyzer** validates plan completeness
### **Phase 3: Code Generation**
1. **Code Generation Agent** transforms scene plans into Manim code
2. **RAG Agent** provides contextual documentation for complex animations
3. **Error Detection** validates code syntax and logic
4. **Quality Assurance** ensures code meets standards
### **Phase 4: Rendering & Assembly**
1. **Video Rendering Agent** executes Manim code to generate scenes
2. **Caching System** optimizes performance through intelligent storage
3. **Parallel Processing** renders multiple scenes concurrently
4. **Quality Control** validates rendered output
### **Phase 5: Final Assembly**
1. **Video Rendering Agent** combines individual scenes
2. **Audio Integration** adds voiceovers and sound effects
3. **Quality Evaluation Agent** performs final validation
4. **Output Manager** delivers final video with metadata
## 🏛️ Design Principles
### **SOLID Principles Implementation**
1. **Single Responsibility Principle**
- Each agent has a focused, well-defined purpose
- Clear separation of concerns across components
2. **Open/Closed Principle**
- System extensible through composition and interfaces
- New agents can be added without modifying existing code
3. **Liskov Substitution Principle**
- Agents implement common interfaces for interchangeability
- Protocol-based design ensures compatibility
4. **Interface Segregation Principle**
- Clean, focused interfaces for agent communication
- No forced dependencies on unused functionality
5. **Dependency Inversion Principle**
- High-level modules depend on abstractions
- Factory pattern for component creation
### **Multi-Agent Coordination Patterns**
1. **Pipeline Architecture**: Sequential processing with clear handoffs
2. **Publish-Subscribe**: Event-driven communication between agents
3. **Factory Pattern**: Dynamic agent creation and configuration
4. **Strategy Pattern**: Pluggable algorithms for different tasks
5. **Observer Pattern**: Monitoring and logging across agents
## ⚡ Performance Optimizations
### **Concurrency & Parallelization**
- **Async/Await**: Non-blocking agent coordination
- **Semaphore Control**: Intelligent resource management
- **Thread Pools**: Parallel I/O operations
- **Concurrent Scene Processing**: Multiple scenes rendered simultaneously
### **Intelligent Caching**
- **Code Hash-based Caching**: Avoid redundant renders
- **Context Caching**: Reuse prompt templates and examples
- **Vector Store Caching**: Optimized document retrieval
### **Resource Management**
- **GPU Acceleration**: Hardware-accelerated rendering
- **Memory Optimization**: Efficient data structures
- **Quality Presets**: Speed vs. quality tradeoffs
## 🔧 Configuration Management
### **Environment Configuration** (`.env`, `src/config/config.py`)
```python
class VideoGenerationConfig:
planner_model: str # Primary AI model
scene_model: Optional[str] = None # Scene-specific model
helper_model: Optional[str] = None # Helper tasks model
max_scene_concurrency: int = 5 # Parallel scene limit
use_rag: bool = False # RAG integration
enable_caching: bool = True # Performance caching
use_gpu_acceleration: bool = False # Hardware acceleration
```
### **Model Provider Configuration**
- Support for multiple LLM providers (OpenAI, Gemini, Claude, etc.)
- Unified interface through LiteLLM
- Cost tracking and usage monitoring
- Automatic failover capabilities
## 📊 Data Flow Architecture
### **Input Data Sources**
- **Theorem Datasets**: JSON files with mathematical concepts (`data/thb_*/`)
- **Context Learning**: Historical examples (`data/context_learning/`)
- **RAG Documentation**: Manim docs and plugins (`data/rag/manim_docs/`)
### **Processing Pipeline**
```
User Input → Topic Analysis → Scene Planning → Code Generation → Rendering → Quality Check → Final Output
↓ ↓ ↓ ↓ ↓ ↓
Configuration → RAG Context → Implementation → Error Fixing → Optimization → Validation
```
### **Output Artifacts**
- **Scene Outlines**: Structured video plans
- **Implementation Plans**: Technical specifications
- **Manim Code**: Executable animation scripts
- **Rendered Videos**: Individual scene outputs
- **Combined Videos**: Final assembled content
- **Metadata**: Processing logs and metrics
## 🎪 Advanced Features
### **Error Recovery & Self-Healing**
- **Multi-layer Retry Logic**: Automatic error recovery at each agent level
- **Intelligent Error Analysis**: Pattern recognition for common failures
- **Self-Reflection**: Code quality validation through visual analysis
- **Fallback Strategies**: Alternative approaches when primary methods fail
### **Monitoring & Observability**
- **Langfuse Integration**: Comprehensive LLM call tracking
- **Performance Metrics**: Render times, success rates, resource usage
- **Status Dashboard**: Real-time pipeline state visualization
- **Cost Tracking**: Token usage and API cost monitoring
### **Scalability Features**
- **Horizontal Scaling**: Multiple concurrent topic processing
- **Resource Pooling**: Shared computational resources
- **Load Balancing**: Intelligent task distribution
- **State Persistence**: Resume interrupted processing
## 🚀 Usage Examples
### **Single Topic Generation**
```bash
python generate_video.py \
--topic "Pythagorean Theorem" \
--context "Explain the mathematical proof and visual demonstration" \
--model "gemini/gemini-2.5-flash-preview-04-17" \
--use_rag \
--quality medium
```
### **Batch Processing**
```bash
python generate_video.py \
--theorems_path data/thb_easy/math.json \
--sample_size 5 \
--max_scene_concurrency 3 \
--use_context_learning \
--enable_caching
```
### **Status Monitoring**
```bash
python generate_video.py \
--theorems_path data/thb_easy/math.json \
--check_status
```
## 📈 System Metrics & KPIs
### **Performance Indicators**
- **Scene Generation Speed**: Average time per scene
- **Rendering Efficiency**: Cache hit rates and parallel utilization
- **Quality Scores**: Automated evaluation metrics
- **Success Rates**: Completion percentage across pipeline stages
### **Resource Utilization**
- **LLM Token Usage**: Cost optimization and efficiency
- **Computational Resources**: CPU/GPU utilization
- **Storage Efficiency**: Cache effectiveness and data management
- **Memory Footprint**: System resource consumption
## 🔮 Future Enhancements
### **Planned Agent Improvements**
- **Advanced Visual Agent**: Enhanced image understanding and generation
- **Audio Synthesis Agent**: Dynamic voiceover generation
- **Interactive Agent**: Real-time user feedback integration
- **Curriculum Agent**: Adaptive learning path generation
### **Technical Roadmap**
- **Distributed Processing**: Multi-node agent deployment
- **Real-time Streaming**: Live video generation capabilities
- **Mobile Integration**: Responsive design for mobile platforms
- **API Gateway**: RESTful service architecture
---
## 📚 Related Documentation
- **[API Reference](docs/api_reference.md)** - Detailed method documentation
- **[Configuration Guide](docs/configuration.md)** - Setup and customization
- **[Development Guide](docs/development.md)** - Contributing and extending
- **[Troubleshooting](docs/troubleshooting.md)** - Common issues and solutions
---
**Last Updated**: August 25, 2025
**Version**: Multi-Agent Enhanced Pipeline v2.0
**Maintainer**: T2M Development Team