Spaces:
Running
on
Zero
Running
on
Zero
File size: 19,041 Bytes
81d8796 0ae46fb 81d8796 ff07c94 81d8796 b720259 109bcee 5bade39 ff07c94 81d8796 b3797f0 67da541 32c6eaa d74506f 29d8b43 d4c1bbe d74506f 32c6eaa d74506f a863763 29d8b43 a863763 d74506f 32c6eaa d74506f 22b7790 d74506f 22b7790 d74506f 32c6eaa d74506f 29d8b43 d74506f 32c6eaa 3b38a6c 32c6eaa 3b38a6c d555d15 af9efda 927a9b8 af9efda 927a9b8 32c6eaa c379f08 ab69c75 c379f08 ab69c75 4f59c32 c379f08 ab69c75 c379f08 927a9b8 d555d15 32c6eaa b720259 32c6eaa d74506f 9ffd75b d74506f 32c6eaa d74506f 29d8b43 d74506f e93ecc8 d74506f 32c6eaa d74506f e93ecc8 c379f08 d4c1bbe af9efda c379f08 d4c1bbe f0a6b02 d4c1bbe d74506f 32c6eaa d74506f d4c1bbe af9efda d4c1bbe af9efda a863763 32c6eaa dadfb77 5040e2f d4c1bbe a863763 d4c1bbe 32c6eaa d4c1bbe 29d8b43 32c6eaa d4c1bbe 32c6eaa d4c1bbe d74506f 32c6eaa d74506f d4c1bbe d74506f 7415155 32c6eaa 7415155 32c6eaa 7415155 32c6eaa c379f08 ab69c75 c379f08 ab69c75 c379f08 32c6eaa c379f08 32c6eaa 7415155 32c6eaa 7415155 32c6eaa 7415155 32c6eaa 7415155 a863763 7415155 c379f08 32c6eaa ab69c75 c379f08 ab69c75 32c6eaa c379f08 32c6eaa c379f08 32c6eaa ab69c75 32c6eaa ab69c75 32c6eaa ab69c75 32c6eaa ab69c75 c379f08 7415155 32c6eaa 7415155 ab69c75 7415155 a863763 7415155 32c6eaa d4c1bbe 7415155 d4c1bbe 7415155 d4c1bbe 7415155 d74506f d4c1bbe |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 |
---
title: MedLLM Agent
emoji: π©Ί
colorFrom: pink
colorTo: red
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
license: mit
short_description: 'Medical MCP agentic RAG & Search with MedSwin'
tags:
- mcp-in-action-track-enterprise
- mcp-in-action-track-creative
- building-mcp-track-enterprise
- building-mcp-track-creative
---
[App Demo](https://huggingface.co/spaces/MCP-1st-Birthday/MedLLM-Agent)
[Follow MedSwin](https://huggingface.co/MedSwin)
[LinkedIn Post](https://www.linkedin.com/posts/dang-khoa-le-96a6332a8_medllm-agent-a-hugging-face-space-by-mcp-activity-7396162709618225152-WSt4?utm_source=social_share_send&utm_medium=member_desktop_web&rcm=ACoAAEoh1KIB2Np4-PEynHrT7uchKFL5ByBIpio)
# π©Ί MedLLM Agent
**Advanced Medical AI Assistant** powered by fine-tuned MedSwin models with comprehensive knowledge retrieval capabilities.
## β¨ Key Features
### π **Document RAG (Retrieval-Augmented Generation)**
- Upload medical documents (PDF, Word, TXT, MD, JSON, XML, CSV) and get answers based on your uploaded content
- Document parsing powered by Gemini MCP for accurate text extraction
- Hierarchical document indexing with auto-merging retrieval for comprehensive context
- Mitigates hallucination by grounding responses in your documents
- Toggle RAG on/off - when disabled, provides concise clinical answers without document context
### π **Web Search Integration (MCP Protocol)**
- **Native MCP Support**: Uses Model Context Protocol (MCP) tools for web search and content extraction
- **Automatic Fallback**: Gracefully falls back to direct library calls if MCP is not configured
- **Configurable MCP Servers**: Connect to any MCP-compatible search server via environment variables
- **Content Extraction**: Automatically fetches and extracts full content from search results using MCP tools
- **Automatic Summarization**: Summarizes web search results using Gemini MCP
- **Enriches Context**: Combines document RAG + web sources for comprehensive answers
### π§ **MedSwin Medical Specialist Models**
MedLLM Agent supports multiple MedSwin models, each optimized for different use cases:
- **[MedSwin DT](https://huggingface.co/MedSwin/MedSwin-Merged-DaRE-TIES-KD-0.7)** (85% acc) - **Default model**. DaRE-TIES merged model combining Supervised Fine-Tuning (SFT) and Knowledge Distillation (KD) techniques. Best overall performance.
- **[MedSwin Nsl](https://huggingface.co/MedSwin/MedSwin-Merged-NuSLERP-KD-0.7)** (85% acc) - NuSLERP merged model from SFT and KD. Alternative high-performance option with different merging strategy.
- **[MedSwin DL](https://huggingface.co/MedSwin/MedSwin-Merged-DaRE-Linear-KD-0.7)** (84% acc) - DaRE-Linear merged model using linear interpolation for knowledge distillation.
- **[MedSwin Ti](https://huggingface.co/MedSwin/MedSwin-Merged-TIES-KD-0.7)** (84% acc) - TIES merged model from SFT and KD. Compact merged architecture.
- **[MedSwin TA](https://huggingface.co/MedSwin/MedSwin-Merged-TA-SFT-0.7)** (84% acc) - Task-Arithmetic merged model from SFT. Optimized for task-specific performance.
- **[MedSwin SFT](https://huggingface.co/MedSwin/MedSwin-7B-SFT)** (82% acc) - Supervised Fine-Tuned model. Base fine-tuned model without merging.
- **[MedSwin KD](https://huggingface.co/MedSwin/MedSwin-7B-KD)** (83% acc) - Knowledge Distillation model from MedGemma-27b-it. Distilled knowledge from larger model.
**Model Features:**
- Models download on-demand for efficient resource usage
- Fine-tuned on MedAlpaca-7B for medical domain expertise
- All models support GPU acceleration for fast inference
### π **Multi-Language Support**
- Automatic language detection
- Non-English queries automatically translated to English
- Medical model processes in English
- Responses translated back to original language
- Powered by Gemini MCP for translation
### π§Ύ **Clinical Intake Q&A Breakdown**
- Gemini intake triage checks whether the userβs concern needs additional questioning (up to 5 follow-ups) and keeps per-session state.
- Intake agent conducts focused Q&A, then auto-summarizes the transcript into patient profile, refined problem statement, and key findings with actionable handoff notes.
- Pipeline injects both the structured insights and raw transcript back into the supervisor so downstream planning, RAG, and search stay grounded in what the patient actually said.
- Users can toggle the intake flow directly in the UI; disabling clears any pending follow-up state.
Sample [conversation](sample.md) between MAC-system and patient are recorded.
### π€ **Voice Features**
- **Speech-to-Text**: Voice input transcription using Whisper large-v3-turbo (Hugging Face) with Gemini MCP fallback
- **Inline Mic Experience**: Built-in microphone widget with live recording timer that drops transcripts straight into the chat box
- **Text-to-Speech**: Voice output generation using Maya1 TTS model (optional, fallback to MCP if unavailable) plus a one-click "Play Response" control for the latest answer
- **Model Status Display**: Real-time status for MedSwin, TTS (maya1), and ASR (Whisper) models
### π€ **MAC Architecture (Multi-Agent Collaboration)**
- **Gemini Supervisor**: Orchestrates query processing by breaking queries into flexible sub-topics (up to 10 based on complexity, explores different approaches)
- **MedSwin Specialist**: Executes tasks sequentially, providing concise clinical answers
- **Enhanced Synthesis**: Supervisor synthesizes all MedSwin responses with clear context into comprehensive final answers
- **Iterative Improvement**: Supervisor challenges and enhances answers until confirmed optimal (up to 2 iterations)
- **History-Aware Follow-Ups**: In agentic modes, Gemini uses the last assistant answer when breaking down follow-up queries (e.g., "clarify your answer") so subtasks stay grounded in prior responses
- **Search Mode**: Gemini creates 1-4 search strategies β executes ddgs searches (1-2 sources each) β summarizes briefly
- **Conditional Search Trigger**: When search mode enabled, supervisor can trigger additional searches if answer is unclear or has gaps
- **RAG Mode**: Gemini brainstorms retrieved documents into 1-4 short contexts for MedSwin decision-making
- **Clean Output**: All internal thoughts/conversations are logged only; users see only the final answer
- **Markdown Format**: Final answers use bullet points (tables automatically converted)
- **Deterministic Mode**: `Disable agentic reasoning` switch runs MedSwin alone for offline-friendly, model-only answers
### β‘ **Adaptive Strategy & Diagnostics**
- **Autonomous Planner**: Gemini reasoning now enables/disables RAG and web search dynamically per query while respecting user toggles.
- **Parallel Search Flow**: Multi-strategy web lookups run concurrently with cached MCP tool discovery and shared embeddings to cut latency.
- **Pipeline Telemetry**: Every session logs stage durations, strategy decisions, and search outcomes for fast troubleshooting and quality tracking.
### βοΈ **Advanced Configuration**
- Customizable generation parameters (temperature, top-p, top-k)
- Adjustable retrieval settings (top-k, merge threshold)
- Increased max tokens to prevent early stopping
- Custom EOS handling for medical models
- Dynamic system prompts based on RAG status
- One-click agentic toggle to run MedSwin alone (no RAG/web search) for deterministic, offline-safe answers
## π Usage
1. **Upload Documents**: Drag and drop PDF, Word, or text files containing medical information
2. **Configure Settings**:
- Enable/disable Document RAG
- Enable/disable Web Search (MCP)
- Select medical model (MedSwin SFT/KD/TA/DT/NSL)
3. **Ask Questions**: Type your medical question in any language
4. **Get Answers**: Receive comprehensive answers based on:
- Your uploaded documents (if RAG enabled)
- Web sources (if web search enabled)
- Medical model's training knowledge
## π§ Technical Details
- **Medical Models**: MedSwin/MedSwin-7B-SFT, MedSwin-7B-KD, MedSwin-Merged-TA-SFT-0.7, MedSwin-Merged-NuSLERP-KD-0.7, MedSwin-Merged-DaRE-TIES-KD-0.7
- **Architecture**: MAC (Multi-Agent Collaboration) - Gemini Supervisor + MedSwin Specialist
- **Translation**: Gemini MCP (gemini-2.5-flash-lite)
- **Document Parsing**: Gemini MCP (PDF, Word, TXT, MD, JSON, XML, CSV)
- **Speech-to-Text**: openai/whisper-large-v3-turbo (Hugging Face, primary) with Gemini MCP fallback
- **Supervisor Tasks**: Gemini MCP (gemini-2.5-flash) - query breakdown, search strategies, RAG brainstorming
- **MedSwin Execution**: GPU-tagged tasks for efficient inference
- **Text-to-Speech**: maya-research/maya1 (optional, with MCP fallback)
- **Embedding Model**: abhinand/MedEmbed-large-v0.1 (domain-tuned medical embeddings)
- **RAG Framework**: LlamaIndex with hierarchical node parsing and auto-merging retrieval
- **Web Search**: MCP tools with automatic fallback to DuckDuckGo
- **MCP Server**: Bundled Python-based Gemini MCP server (agent.py)
## π Requirements
See `requirements.txt` for full dependency list. Key dependencies:
- **MCP Integration**: `mcp`, `nest-asyncio`, `google-genai` (for Gemini MCP server)
- **Fallback Dependencies**: `requests`, `beautifulsoup4`, `ddgs` (used when MCP web search unavailable)
- **Core ML**: `transformers`, `torch`, `accelerate`, `torchaudio`
- **RAG Framework**: `llama-index`, `llama_index.llms.huggingface`, `llama_index.embeddings.huggingface`
- **Utilities**: `langdetect`, `gradio`, `spaces`, `soundfile`
- **TTS**: Optional - `TTS` package (voice features work with MCP fallback if unavailable)
- **ASR**: Whisper via `transformers` (openai/whisper-large-v3-turbo from Hugging Face)
### π MCP Configuration
The application uses a bundled Gemini MCP server (agent.py) for translation, document parsing, transcription, and summarization. Configure via environment variables
**Setup Steps:**
1. **Install Dependencies** (already in requirements.txt):
```bash
pip install mcp nest-asyncio google-genai
```
2. **Get Gemini API Key**:
- Visit [Google AI Studio](https://aistudio.google.com/) to get your API key
- Set it: `export GEMINI_API_KEY="your-api-key"`
3. **Run the Application**:
- The bundled MCP server (agent.py) will be used automatically
- No additional MCP server installation required
**Note**: The application requires Gemini MCP for translation, document parsing, transcription, and summarization. Web search supports fallback to direct DuckDuckGo API if MCP web search tools are unavailable.
## π― Use Cases
- **Clinical Decision Support**: Evidence-based answers from documents and current medical literature
- **Medical Document Q&A**: Query uploaded patient records, research papers, and clinical guidelines
- **Multi-Language Consultations**: Automatic translation for international patient care
- **Research Assistance**: Synthesize information from multiple medical sources
- **Drug Information**: Comprehensive drug information with interaction analysis
## π₯ Enterprise-Level Clinical Decision Support
### **Empowering Medical Specialists with AI-Powered Decision Support**
MedLLM Agent is designed to support **doctors, clinicians, and medical specialists** in making informed clinical decisions by leveraging the power of Large Language Models (LLMs) and Model Context Protocol (MCP). This system transforms how medical professionals access, analyze, and synthesize medical information in real-time.
### **Key Enterprise Capabilities**
#### 1. **MAC Architecture (Multi-Agent Collaboration)**
- **Gemini Supervisor Agent**:
- Breaks user queries into flexible sub-topics (up to 10 based on complexity, explores different approaches/angles)
- Synthesizes all MedSwin responses with clear context into comprehensive final answers
- Challenges and enhances answers iteratively until confirmed optimal (up to 2 iterations)
- In search mode: creates 1-4 search strategies, executes ddgs (1-2 sources each), summarizes briefly
- Conditional search trigger: Can trigger additional searches if answer is unclear or has gaps (only when search mode enabled)
- In RAG mode: brainstorms retrieved documents into 1-4 concise contexts
- All supervisor decisions logged internally, not displayed
- **MedSwin Specialist Agent**:
- Executes tasks assigned by Gemini Supervisor (GPU-tagged)
- Processes each sub-topic sequentially with focused context
- Generates concise, clinically accurate answers
- Returns Markdown format with bullet points (tables auto-converted)
#### 2. **Clean User Experience**
- **Internal Thoughts Hidden**: All Gemini-MedSwin conversations logged only
- **Final Answer Only**: Users see only the polished, final answer
- **Structured Output**: Markdown bullets, no internal planning tables
- **Efficient Processing**: Contexts kept brief to respect token limits
### **Enterprise Use Cases for Medical Specialists**
#### **Clinical Decision Support**
- **Diagnostic Assistance**: Upload patient records, lab results, and medical histories. Ask complex diagnostic questions and receive evidence-based answers grounded in your documents and current medical literature.
- **Treatment Planning**: Query treatment protocols, drug interactions, and therapeutic guidelines. The system autonomously retrieves relevant information from your clinical documents and current medical databases.
- **Drug Information & Interactions**: Get comprehensive drug information, contraindications, and interaction analyses by combining your formulary documents with up-to-date web sources.
#### **Research & Evidence Synthesis**
- **Literature Review Support**: Upload research papers, clinical trials, and medical literature. The system helps synthesize findings, identify connections, and answer research questions.
- **Clinical Guideline Analysis**: Compare and analyze multiple clinical guidelines, protocols, and best practices from your document library.
#### **Multi-Language Clinical Support**
- **International Patient Care**: Handle queries in multiple languages. The system automatically translates, processes with medical models, and translates responses backβenabling care for diverse patient populations.
#### **Real-Time Information Access**
- **Current Medical Knowledge**: Leverage MCP web search to access:
- Latest treatment protocols
- Recent clinical trial results
- Updated drug information
- Current medical guidelines
- **MCP Protocol Benefits**: Standardized, modular tool integration allows easy switching between search providers and enhanced reliability
### **How It Works: MAC Architecture in Action**
1. **Gemini Supervisor - Query Breakdown** β Analyzes query and breaks into flexible sub-topics (up to 10 based on complexity):
- Example: "What are the treatment options for Type 2 diabetes in elderly patients with renal impairment?"
- Explores different approaches (clinical, diagnostic, treatment, prevention perspectives)
- Creates structured sub-topics: treatment options, age considerations, renal function impact, drug interactions, monitoring protocols
- Number of subtasks adapts to query complexity (not limited to 4)
- All planning logged internally, not displayed to user
2. **Gemini Supervisor - Context Preparation**:
- **Search Mode**: Creates 1-4 search strategies β executes ddgs (1-2 sources each) β summarizes briefly
- **RAG Mode**: Retrieves documents β brainstorms into 1-4 concise contexts for MedSwin
- Contexts kept brief to respect MedSwin token limits
3. **MedSwin Specialist - Task Execution** (GPU-tagged):
- Executes each sub-topic task sequentially
- Receives focused context from Gemini Supervisor
- Generates concise clinical answers (Markdown bullets, no tables)
- All execution logged internally
4. **Gemini Supervisor - Answer Synthesis**:
- Synthesizes all MedSwin responses with clear context
- Integrates information from all sub-topics seamlessly
- Creates coherent, comprehensive final answer
- Provides better context than simple concatenation
5. **Gemini Supervisor - Challenge & Enhancement Loop**:
- Evaluates answer quality (completeness, accuracy, clarity)
- Challenges answer if not optimal
- Provides specific enhancement instructions
- Enhances answer iteratively (up to 2 iterations)
- Continues until answer confirmed optimal
6. **Conditional Search Trigger** (only when search mode enabled):
- Supervisor checks if answer is unclear or has gaps
- If needed, generates specific search queries to fill gaps
- Executes additional searches
- Enhances answer with new search context
7. **Final Answer Assembly**:
- Converts any tables to Markdown bullets
- Adds citations if web sources used
- Translates back if needed
- **Only final answer displayed** - all internal thoughts remain in logs
### **Enterprise Benefits**
β
**Time Efficiency**: Reduces time spent searching through documents and medical databases
β
**Evidence-Based Decisions**: Grounds answers in uploaded documents and current medical literature
β
**Reduced Hallucination**: RAG ensures answers are based on actual documents and verified sources
β
**Comprehensive Coverage**: Combines institutional knowledge (documents) with current medical knowledge (web)
β
**Enhanced Quality**: Iterative challenge loop ensures answers are optimal before delivery
β
**Flexible Task Breakdown**: Adapts to query complexity with flexible subtask generation (not limited to 4 steps)
β
**Intelligent Search**: Conditional search trigger fills gaps when answers are unclear
β
**Better Context**: Enhanced synthesis provides clearer, more comprehensive final answers
β
**Scalability**: Handles multiple languages, complex queries, and large document libraries
β
**Clinical Workflow Integration**: Designed to fit into existing clinical decision-making processes
β
**MCP Protocol**: Standardized tool integration for reliable, maintainable web search capabilities
### **Implementation in Clinical Settings**
- **Hospital Systems**: Clinical decision support with EMR integration and institutional medical libraries
- **Specialty Clinics**: Customize with specialty-specific documents and guidelines
- **Medical Education**: Comprehensive, evidence-based answers for training and education
- **Research Institutions**: Accelerate research by synthesizing information from multiple sources
---
**β οΈ Important Disclaimer**: This system is designed to **assist** medical professionals with information retrieval and synthesis. It does not replace clinical judgment. All medical decisions must be made by qualified healthcare professionals who consider the full clinical context, patient-specific factors, and their professional expertise.
---
> **Built for MCP-1st-Birthday Hackathon**: Enterprise-level clinical decision support system integrating MCP protocol, document RAG, and autonomous reasoning capabilities. |