Spaces:

MCP-1st-Birthday
/

MedLLM-Agent

Running on Zero

File size: 19,041 Bytes

81d8796
 
0ae46fb
81d8796
 
 
ff07c94
81d8796
 
 
b720259
109bcee
 
5bade39
 
 
ff07c94
81d8796
b3797f0
67da541
 
 
32c6eaa
d74506f
 
 
 
 
 
 
29d8b43
 
d4c1bbe
d74506f
 
 
32c6eaa
d74506f
a863763
 
 
 
29d8b43
a863763
d74506f
32c6eaa
d74506f
22b7790
 
 
 
 
 
 
 
 
 
 
 
d74506f
 
22b7790
d74506f
32c6eaa
d74506f
 
 
 
 
29d8b43
d74506f
32c6eaa
3b38a6c
 
 
 
32c6eaa
 
 
3b38a6c
d555d15
af9efda
927a9b8
 
af9efda
927a9b8
32c6eaa
c379f08
ab69c75
c379f08
ab69c75
 
4f59c32
c379f08
ab69c75
c379f08
 
 
927a9b8
d555d15
32c6eaa
b720259
 
 
 
 
32c6eaa
d74506f
 
 
 
 
 
9ffd75b
d74506f
32c6eaa
d74506f
 
29d8b43
d74506f
 
 
e93ecc8
d74506f
 
 
 
 
 
32c6eaa
d74506f
 
e93ecc8
c379f08
d4c1bbe
 
af9efda
c379f08
 
d4c1bbe
f0a6b02
d4c1bbe
 
 
d74506f
32c6eaa
d74506f
 
 
d4c1bbe
 
af9efda
d4c1bbe
 
 
af9efda
a863763
 
32c6eaa
dadfb77
5040e2f
d4c1bbe
 
a863763
d4c1bbe
32c6eaa
d4c1bbe
29d8b43
32c6eaa
d4c1bbe
 
32c6eaa
d4c1bbe
d74506f
32c6eaa
d74506f
 
d4c1bbe
 
 
 
 
d74506f
7415155
32c6eaa
7415155
32c6eaa
7415155
 
32c6eaa
c379f08
 
ab69c75
 
 
c379f08
ab69c75
c379f08
32c6eaa
c379f08
 
 
 
 
 
 
 
 
 
32c6eaa
7415155
 
32c6eaa
7415155
32c6eaa
 
 
7415155
 
 
32c6eaa
7415155
 
 
 
 
 
 
 
 
 
 
a863763
7415155
c379f08
32c6eaa
ab69c75
c379f08
ab69c75
 
 
32c6eaa
c379f08
 
 
32c6eaa
c379f08
 
 
 
32c6eaa
ab69c75
 
 
 
32c6eaa
ab69c75
 
 
 
 
32c6eaa
ab69c75
 
 
 
32c6eaa
ab69c75
c379f08
 
 
 
7415155
 
32c6eaa
7415155
 
 
 
ab69c75
 
 
 
7415155
a863763
 
7415155
 
32c6eaa
d4c1bbe
 
 
 
7415155
d4c1bbe
7415155
d4c1bbe
7415155
d74506f
 
d4c1bbe

---
title: MedLLM Agent
emoji: 🩺
colorFrom: pink
colorTo: red
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
license: mit
short_description: 'Medical MCP agentic RAG & Search with MedSwin'
tags:
  - mcp-in-action-track-enterprise
  - mcp-in-action-track-creative
  - building-mcp-track-enterprise
  - building-mcp-track-creative
  
---

[App Demo](https://huggingface.co/spaces/MCP-1st-Birthday/MedLLM-Agent)  
[Follow MedSwin](https://huggingface.co/MedSwin)  
[LinkedIn Post](https://www.linkedin.com/posts/dang-khoa-le-96a6332a8_medllm-agent-a-hugging-face-space-by-mcp-activity-7396162709618225152-WSt4?utm_source=social_share_send&utm_medium=member_desktop_web&rcm=ACoAAEoh1KIB2Np4-PEynHrT7uchKFL5ByBIpio)  

# 🩺 MedLLM Agent

**Advanced Medical AI Assistant** powered by fine-tuned MedSwin models with comprehensive knowledge retrieval capabilities.

## ✨ Key Features

### 📄 **Document RAG (Retrieval-Augmented Generation)**
- Upload medical documents (PDF, Word, TXT, MD, JSON, XML, CSV) and get answers based on your uploaded content
- Document parsing powered by Gemini MCP for accurate text extraction
- Hierarchical document indexing with auto-merging retrieval for comprehensive context
- Mitigates hallucination by grounding responses in your documents
- Toggle RAG on/off - when disabled, provides concise clinical answers without document context


### 🌐 **Web Search Integration (MCP Protocol)**
- **Native MCP Support**: Uses Model Context Protocol (MCP) tools for web search and content extraction
- **Automatic Fallback**: Gracefully falls back to direct library calls if MCP is not configured
- **Configurable MCP Servers**: Connect to any MCP-compatible search server via environment variables
- **Content Extraction**: Automatically fetches and extracts full content from search results using MCP tools
- **Automatic Summarization**: Summarizes web search results using Gemini MCP
- **Enriches Context**: Combines document RAG + web sources for comprehensive answers


### 🧠 **MedSwin Medical Specialist Models**

MedLLM Agent supports multiple MedSwin models, each optimized for different use cases:

- **[MedSwin DT](https://huggingface.co/MedSwin/MedSwin-Merged-DaRE-TIES-KD-0.7)** (85% acc) - **Default model**. DaRE-TIES merged model combining Supervised Fine-Tuning (SFT) and Knowledge Distillation (KD) techniques. Best overall performance.
- **[MedSwin Nsl](https://huggingface.co/MedSwin/MedSwin-Merged-NuSLERP-KD-0.7)** (85% acc) - NuSLERP merged model from SFT and KD. Alternative high-performance option with different merging strategy.
- **[MedSwin DL](https://huggingface.co/MedSwin/MedSwin-Merged-DaRE-Linear-KD-0.7)** (84% acc) - DaRE-Linear merged model using linear interpolation for knowledge distillation.
- **[MedSwin Ti](https://huggingface.co/MedSwin/MedSwin-Merged-TIES-KD-0.7)** (84% acc) - TIES merged model from SFT and KD. Compact merged architecture.
- **[MedSwin TA](https://huggingface.co/MedSwin/MedSwin-Merged-TA-SFT-0.7)** (84% acc) - Task-Arithmetic merged model from SFT. Optimized for task-specific performance.
- **[MedSwin SFT](https://huggingface.co/MedSwin/MedSwin-7B-SFT)** (82% acc) - Supervised Fine-Tuned model. Base fine-tuned model without merging.
- **[MedSwin KD](https://huggingface.co/MedSwin/MedSwin-7B-KD)** (83% acc) - Knowledge Distillation model from MedGemma-27b-it. Distilled knowledge from larger model.

**Model Features:**
- Models download on-demand for efficient resource usage
- Fine-tuned on MedAlpaca-7B for medical domain expertise
- All models support GPU acceleration for fast inference


### 🌍 **Multi-Language Support**
- Automatic language detection
- Non-English queries automatically translated to English
- Medical model processes in English
- Responses translated back to original language
- Powered by Gemini MCP for translation


### 🧾 **Clinical Intake Q&A Breakdown**
- Gemini intake triage checks whether the user’s concern needs additional questioning (up to 5 follow-ups) and keeps per-session state.
- Intake agent conducts focused Q&A, then auto-summarizes the transcript into patient profile, refined problem statement, and key findings with actionable handoff notes.
- Pipeline injects both the structured insights and raw transcript back into the supervisor so downstream planning, RAG, and search stay grounded in what the patient actually said.
- Users can toggle the intake flow directly in the UI; disabling clears any pending follow-up state.  

Sample [conversation](sample.md) between MAC-system and patient are recorded.

### 🎤 **Voice Features**
- **Speech-to-Text**: Voice input transcription using Whisper large-v3-turbo (Hugging Face) with Gemini MCP fallback
- **Inline Mic Experience**: Built-in microphone widget with live recording timer that drops transcripts straight into the chat box
- **Text-to-Speech**: Voice output generation using Maya1 TTS model (optional, fallback to MCP if unavailable) plus a one-click "Play Response" control for the latest answer
- **Model Status Display**: Real-time status for MedSwin, TTS (maya1), and ASR (Whisper) models


### 🤝 **MAC Architecture (Multi-Agent Collaboration)**
- **Gemini Supervisor**: Orchestrates query processing by breaking queries into flexible sub-topics (up to 10 based on complexity, explores different approaches)
- **MedSwin Specialist**: Executes tasks sequentially, providing concise clinical answers
- **Enhanced Synthesis**: Supervisor synthesizes all MedSwin responses with clear context into comprehensive final answers
- **Iterative Improvement**: Supervisor challenges and enhances answers until confirmed optimal (up to 2 iterations)
- **History-Aware Follow-Ups**: In agentic modes, Gemini uses the last assistant answer when breaking down follow-up queries (e.g., "clarify your answer") so subtasks stay grounded in prior responses
- **Search Mode**: Gemini creates 1-4 search strategies → executes ddgs searches (1-2 sources each) → summarizes briefly
- **Conditional Search Trigger**: When search mode enabled, supervisor can trigger additional searches if answer is unclear or has gaps
- **RAG Mode**: Gemini brainstorms retrieved documents into 1-4 short contexts for MedSwin decision-making
- **Clean Output**: All internal thoughts/conversations are logged only; users see only the final answer
- **Markdown Format**: Final answers use bullet points (tables automatically converted)
- **Deterministic Mode**: `Disable agentic reasoning` switch runs MedSwin alone for offline-friendly, model-only answers


### ⚡ **Adaptive Strategy & Diagnostics**
- **Autonomous Planner**: Gemini reasoning now enables/disables RAG and web search dynamically per query while respecting user toggles.
- **Parallel Search Flow**: Multi-strategy web lookups run concurrently with cached MCP tool discovery and shared embeddings to cut latency.
- **Pipeline Telemetry**: Every session logs stage durations, strategy decisions, and search outcomes for fast troubleshooting and quality tracking.


### ⚙️ **Advanced Configuration**
- Customizable generation parameters (temperature, top-p, top-k)
- Adjustable retrieval settings (top-k, merge threshold)
- Increased max tokens to prevent early stopping
- Custom EOS handling for medical models
- Dynamic system prompts based on RAG status
- One-click agentic toggle to run MedSwin alone (no RAG/web search) for deterministic, offline-safe answers


## 🚀 Usage

1. **Upload Documents**: Drag and drop PDF, Word, or text files containing medical information
2. **Configure Settings**: 
   - Enable/disable Document RAG
   - Enable/disable Web Search (MCP)
   - Select medical model (MedSwin SFT/KD/TA/DT/NSL)
3. **Ask Questions**: Type your medical question in any language
4. **Get Answers**: Receive comprehensive answers based on:
   - Your uploaded documents (if RAG enabled)
   - Web sources (if web search enabled)
   - Medical model's training knowledge


## 🔧 Technical Details

- **Medical Models**: MedSwin/MedSwin-7B-SFT, MedSwin-7B-KD, MedSwin-Merged-TA-SFT-0.7, MedSwin-Merged-NuSLERP-KD-0.7, MedSwin-Merged-DaRE-TIES-KD-0.7
- **Architecture**: MAC (Multi-Agent Collaboration) - Gemini Supervisor + MedSwin Specialist
- **Translation**: Gemini MCP (gemini-2.5-flash-lite)
- **Document Parsing**: Gemini MCP (PDF, Word, TXT, MD, JSON, XML, CSV)
- **Speech-to-Text**: openai/whisper-large-v3-turbo (Hugging Face, primary) with Gemini MCP fallback
- **Supervisor Tasks**: Gemini MCP (gemini-2.5-flash) - query breakdown, search strategies, RAG brainstorming
- **MedSwin Execution**: GPU-tagged tasks for efficient inference
- **Text-to-Speech**: maya-research/maya1 (optional, with MCP fallback)
- **Embedding Model**: abhinand/MedEmbed-large-v0.1 (domain-tuned medical embeddings)
- **RAG Framework**: LlamaIndex with hierarchical node parsing and auto-merging retrieval
- **Web Search**: MCP tools with automatic fallback to DuckDuckGo
- **MCP Server**: Bundled Python-based Gemini MCP server (agent.py)


## 📋 Requirements

See `requirements.txt` for full dependency list. Key dependencies:
- **MCP Integration**: `mcp`, `nest-asyncio`, `google-genai` (for Gemini MCP server)
- **Fallback Dependencies**: `requests`, `beautifulsoup4`, `ddgs` (used when MCP web search unavailable)
- **Core ML**: `transformers`, `torch`, `accelerate`, `torchaudio`
- **RAG Framework**: `llama-index`, `llama_index.llms.huggingface`, `llama_index.embeddings.huggingface`
- **Utilities**: `langdetect`, `gradio`, `spaces`, `soundfile`
- **TTS**: Optional - `TTS` package (voice features work with MCP fallback if unavailable)
- **ASR**: Whisper via `transformers` (openai/whisper-large-v3-turbo from Hugging Face)


### 🔌 MCP Configuration
The application uses a bundled Gemini MCP server (agent.py) for translation, document parsing, transcription, and summarization. Configure via environment variables

**Setup Steps:**
1. **Install Dependencies** (already in requirements.txt):
   ```bash
   pip install mcp nest-asyncio google-genai
   ```  
2. **Get Gemini API Key**:
   - Visit [Google AI Studio](https://aistudio.google.com/) to get your API key
   - Set it: `export GEMINI_API_KEY="your-api-key"`  
3. **Run the Application**:
   - The bundled MCP server (agent.py) will be used automatically
   - No additional MCP server installation required  
**Note**: The application requires Gemini MCP for translation, document parsing, transcription, and summarization. Web search supports fallback to direct DuckDuckGo API if MCP web search tools are unavailable.


## 🎯 Use Cases

- **Clinical Decision Support**: Evidence-based answers from documents and current medical literature
- **Medical Document Q&A**: Query uploaded patient records, research papers, and clinical guidelines
- **Multi-Language Consultations**: Automatic translation for international patient care
- **Research Assistance**: Synthesize information from multiple medical sources
- **Drug Information**: Comprehensive drug information with interaction analysis


## 🏥 Enterprise-Level Clinical Decision Support

### **Empowering Medical Specialists with AI-Powered Decision Support**  
MedLLM Agent is designed to support **doctors, clinicians, and medical specialists** in making informed clinical decisions by leveraging the power of Large Language Models (LLMs) and Model Context Protocol (MCP). This system transforms how medical professionals access, analyze, and synthesize medical information in real-time.

### **Key Enterprise Capabilities**  
#### 1. **MAC Architecture (Multi-Agent Collaboration)**
- **Gemini Supervisor Agent**: 
  - Breaks user queries into flexible sub-topics (up to 10 based on complexity, explores different approaches/angles)
  - Synthesizes all MedSwin responses with clear context into comprehensive final answers
  - Challenges and enhances answers iteratively until confirmed optimal (up to 2 iterations)
  - In search mode: creates 1-4 search strategies, executes ddgs (1-2 sources each), summarizes briefly
  - Conditional search trigger: Can trigger additional searches if answer is unclear or has gaps (only when search mode enabled)
  - In RAG mode: brainstorms retrieved documents into 1-4 concise contexts
  - All supervisor decisions logged internally, not displayed  
- **MedSwin Specialist Agent**:
  - Executes tasks assigned by Gemini Supervisor (GPU-tagged)
  - Processes each sub-topic sequentially with focused context
  - Generates concise, clinically accurate answers
  - Returns Markdown format with bullet points (tables auto-converted)

#### 2. **Clean User Experience**
- **Internal Thoughts Hidden**: All Gemini-MedSwin conversations logged only
- **Final Answer Only**: Users see only the polished, final answer
- **Structured Output**: Markdown bullets, no internal planning tables
- **Efficient Processing**: Contexts kept brief to respect token limits  


### **Enterprise Use Cases for Medical Specialists**  

#### **Clinical Decision Support**
- **Diagnostic Assistance**: Upload patient records, lab results, and medical histories. Ask complex diagnostic questions and receive evidence-based answers grounded in your documents and current medical literature.  
- **Treatment Planning**: Query treatment protocols, drug interactions, and therapeutic guidelines. The system autonomously retrieves relevant information from your clinical documents and current medical databases.  
- **Drug Information & Interactions**: Get comprehensive drug information, contraindications, and interaction analyses by combining your formulary documents with up-to-date web sources.

#### **Research & Evidence Synthesis**
- **Literature Review Support**: Upload research papers, clinical trials, and medical literature. The system helps synthesize findings, identify connections, and answer research questions.  
- **Clinical Guideline Analysis**: Compare and analyze multiple clinical guidelines, protocols, and best practices from your document library.

#### **Multi-Language Clinical Support**
- **International Patient Care**: Handle queries in multiple languages. The system automatically translates, processes with medical models, and translates responses back—enabling care for diverse patient populations.

#### **Real-Time Information Access**
- **Current Medical Knowledge**: Leverage MCP web search to access:
  - Latest treatment protocols
  - Recent clinical trial results
  - Updated drug information
  - Current medical guidelines
- **MCP Protocol Benefits**: Standardized, modular tool integration allows easy switching between search providers and enhanced reliability


### **How It Works: MAC Architecture in Action**  
1. **Gemini Supervisor - Query Breakdown** → Analyzes query and breaks into flexible sub-topics (up to 10 based on complexity):
   - Example: "What are the treatment options for Type 2 diabetes in elderly patients with renal impairment?"
   - Explores different approaches (clinical, diagnostic, treatment, prevention perspectives)
   - Creates structured sub-topics: treatment options, age considerations, renal function impact, drug interactions, monitoring protocols
   - Number of subtasks adapts to query complexity (not limited to 4)
   - All planning logged internally, not displayed to user  
2. **Gemini Supervisor - Context Preparation**:
   - **Search Mode**: Creates 1-4 search strategies → executes ddgs (1-2 sources each) → summarizes briefly
   - **RAG Mode**: Retrieves documents → brainstorms into 1-4 concise contexts for MedSwin
   - Contexts kept brief to respect MedSwin token limits    
3. **MedSwin Specialist - Task Execution** (GPU-tagged):
   - Executes each sub-topic task sequentially
   - Receives focused context from Gemini Supervisor
   - Generates concise clinical answers (Markdown bullets, no tables)
   - All execution logged internally  
4. **Gemini Supervisor - Answer Synthesis**:
   - Synthesizes all MedSwin responses with clear context
   - Integrates information from all sub-topics seamlessly
   - Creates coherent, comprehensive final answer
   - Provides better context than simple concatenation  
5. **Gemini Supervisor - Challenge & Enhancement Loop**:
   - Evaluates answer quality (completeness, accuracy, clarity)
   - Challenges answer if not optimal
   - Provides specific enhancement instructions
   - Enhances answer iteratively (up to 2 iterations)
   - Continues until answer confirmed optimal  
6. **Conditional Search Trigger** (only when search mode enabled):
   - Supervisor checks if answer is unclear or has gaps
   - If needed, generates specific search queries to fill gaps
   - Executes additional searches
   - Enhances answer with new search context  
7. **Final Answer Assembly**:
   - Converts any tables to Markdown bullets
   - Adds citations if web sources used
   - Translates back if needed
   - **Only final answer displayed** - all internal thoughts remain in logs


### **Enterprise Benefits**
✅ **Time Efficiency**: Reduces time spent searching through documents and medical databases  
✅ **Evidence-Based Decisions**: Grounds answers in uploaded documents and current medical literature  
✅ **Reduced Hallucination**: RAG ensures answers are based on actual documents and verified sources  
✅ **Comprehensive Coverage**: Combines institutional knowledge (documents) with current medical knowledge (web)  
✅ **Enhanced Quality**: Iterative challenge loop ensures answers are optimal before delivery  
✅ **Flexible Task Breakdown**: Adapts to query complexity with flexible subtask generation (not limited to 4 steps)  
✅ **Intelligent Search**: Conditional search trigger fills gaps when answers are unclear  
✅ **Better Context**: Enhanced synthesis provides clearer, more comprehensive final answers  
✅ **Scalability**: Handles multiple languages, complex queries, and large document libraries  
✅ **Clinical Workflow Integration**: Designed to fit into existing clinical decision-making processes  
✅ **MCP Protocol**: Standardized tool integration for reliable, maintainable web search capabilities


### **Implementation in Clinical Settings**
- **Hospital Systems**: Clinical decision support with EMR integration and institutional medical libraries
- **Specialty Clinics**: Customize with specialty-specific documents and guidelines
- **Medical Education**: Comprehensive, evidence-based answers for training and education
- **Research Institutions**: Accelerate research by synthesizing information from multiple sources

---

**⚠️ Important Disclaimer**: This system is designed to **assist** medical professionals with information retrieval and synthesis. It does not replace clinical judgment. All medical decisions must be made by qualified healthcare professionals who consider the full clinical context, patient-specific factors, and their professional expertise.

---

> **Built for MCP-1st-Birthday Hackathon**: Enterprise-level clinical decision support system integrating MCP protocol, document RAG, and autonomous reasoning capabilities.