Spaces:

MCP-1st-Birthday
/

MedLLM-Agent

Running on Zero

App Files Files Community

LiamKhoaLe commited on Nov 16, 2025

Commit

7415155

1 Parent(s): 109bcee

Upd autonomous reasoning, planning, and execution.

Browse files

Files changed (2) hide show

README.md +110 -2
app.py +303 -15

README.md CHANGED Viewed

@@ -90,8 +90,116 @@ See `requirements.txt` for full dependency list. Key dependencies:
 - Multi-language medical consultations
 - Evidence-based medical answers
 ---
-**Note**: This system is designed to assist with medical information retrieval. Always consult qualified healthcare professionals for medical decisions.
-> Introduction: A medical app for MCP-1st-Birthday hackathon, integrate MCP searcher and document RAG

 - Multi-language medical consultations
 - Evidence-based medical answers
+## 🏥 Enterprise-Level Clinical Decision Support
+### **Empowering Medical Specialists with AI-Powered Decision Support**
+MedLLM Agent is designed to support **doctors, clinicians, and medical specialists** in making informed clinical decisions by leveraging the power of Large Language Models (LLMs) and Model Context Protocol (MCP). This system transforms how medical professionals access, analyze, and synthesize medical information in real-time.
+### **Key Enterprise Capabilities**
+#### 1. **Autonomous Reasoning & Planning**
+- **Intelligent Query Analysis**: The system autonomously analyzes medical queries to understand:
+  - Query type (diagnosis, treatment, drug information, symptom analysis)
+  - Complexity level (simple, moderate, complex, multi-faceted)
+  - Information requirements and data sources needed
+- **Multi-Step Execution Planning**: For complex clinical questions, the system:
+  - Breaks down queries into sub-questions
+  - Creates structured execution plans
+  - Determines optimal information gathering strategies
+  - Adapts approach based on query complexity
+#### 2. **Autonomous Decision-Making**
+- **Smart Resource Selection**: The system autonomously decides:
+  - When to use document RAG vs. web search
+  - When both sources are needed for comprehensive answers
+  - Optimal retrieval parameters based on query characteristics
+- **Context-Aware Execution**: Automatically:
+  - Overrides user settings when reasoning suggests better approaches
+  - Combines multiple information sources intelligently
+  - Prioritizes evidence-based medical sources
+#### 3. **Self-Reflection & Quality Assurance**
+- **Answer Quality Evaluation**: For complex queries, the system:
+  - Self-evaluates answer completeness and accuracy
+  - Identifies missing information or aspects
+  - Provides improvement suggestions
+  - Ensures high-quality clinical responses
+### **Enterprise Use Cases for Medical Specialists**
+#### **Clinical Decision Support**
+- **Diagnostic Assistance**: Upload patient records, lab results, and medical histories. Ask complex diagnostic questions and receive evidence-based answers grounded in your documents and current medical literature.
+- **Treatment Planning**: Query treatment protocols, drug interactions, and therapeutic guidelines. The system autonomously retrieves relevant information from your clinical documents and current medical databases.
+- **Drug Information & Interactions**: Get comprehensive drug information, contraindications, and interaction analyses by combining your formulary documents with up-to-date web sources.
+#### **Research & Evidence Synthesis**
+- **Literature Review Support**: Upload research papers, clinical trials, and medical literature. The system helps synthesize findings, identify connections, and answer research questions.
+- **Clinical Guideline Analysis**: Compare and analyze multiple clinical guidelines, protocols, and best practices from your document library.
+#### **Multi-Language Clinical Support**
+- **International Patient Care**: Handle queries in multiple languages. The system automatically translates, processes with medical models, and translates responses back—enabling care for diverse patient populations.
+#### **Real-Time Information Access**
+- **Current Medical Knowledge**: Leverage MCP web search to access:
+  - Latest treatment protocols
+  - Recent clinical trial results
+  - Updated drug information
+  - Current medical guidelines
+### **How It Works: Autonomous Reasoning in Action**
+1. **Query Analysis** → System analyzes: "What are the treatment options for Type 2 diabetes in elderly patients with renal impairment?"
+   - Identifies as complex, multi-faceted query
+   - Determines need for both RAG (patient records) and web search (current guidelines)
+   - Breaks into sub-questions: treatment options, age considerations, renal function impact
+2. **Autonomous Planning** → Creates execution plan:
+   - Step 1: Language detection/translation
+   - Step 2: RAG retrieval from patient documents
+   - Step 3: Web search for current diabetes treatment guidelines
+   - Step 4: Multi-step reasoning for each sub-question
+   - Step 5: Synthesis of comprehensive answer
+   - Step 6: Self-reflection on answer quality
+3. **Autonomous Execution** → System executes plan:
+   - Retrieves relevant patient history from documents
+   - Searches web for latest ADA/ADA-EASD guidelines
+   - Synthesizes information considering age and renal function
+   - Generates evidence-based treatment recommendations
+4. **Self-Reflection** → Evaluates answer:
+   - Checks completeness (all sub-questions addressed?)
+   - Verifies accuracy (evidence-based?)
+   - Suggests improvements if needed
+### **Enterprise Benefits**
+✅ **Time Efficiency**: Reduces time spent searching through documents and medical databases
+✅ **Evidence-Based Decisions**: Grounds answers in uploaded documents and current medical literature
+✅ **Reduced Hallucination**: RAG ensures answers are based on actual documents and verified sources
+✅ **Comprehensive Coverage**: Combines institutional knowledge (documents) with current medical knowledge (web)
+✅ **Quality Assurance**: Self-reflection ensures high-quality, complete answers
+✅ **Scalability**: Handles multiple languages, complex queries, and large document libraries
+✅ **Clinical Workflow Integration**: Designed to fit into existing clinical decision-making processes
+### **Implementation in Clinical Settings**
+**Hospital Systems**: Deploy for clinical decision support, integrating with EMR systems and institutional medical libraries.
+**Specialty Clinics**: Customize for specific medical specialties by uploading specialty-specific documents and guidelines.
+**Medical Education**: Support medical training and education with comprehensive, evidence-based answers.
+**Research Institutions**: Accelerate medical research by synthesizing information from multiple sources.
 ---
+**Note**: This system is designed to **assist** medical professionals with information retrieval and synthesis. It does not replace clinical judgment. All medical decisions should be made by qualified healthcare professionals who consider the full clinical context, patient-specific factors, and their professional expertise.
+> Introduction: A medical app for MCP-1st-Birthday hackathon, integrating MCP searcher and document RAG with autonomous reasoning, planning, and execution capabilities for enterprise-level clinical decision support.

app.py CHANGED Viewed

@@ -5,6 +5,7 @@ import logging
 import torch
 import threading
 import time
 from transformers import (
     AutoModelForCausalLM,
     AutoTokenizer,
@@ -351,6 +352,249 @@ def get_llm_for_rag(temperature=0.7, max_new_tokens=256, top_p=0.95, top_k=50):
         }
     )
 def extract_text_from_document(file):
     file_name = file.name
     file_extension = os.path.splitext(file_name)[1].lower()
@@ -493,7 +737,32 @@ def stream_chat(
         yield history + [{"role": "assistant", "content": "Session initialization failed. Please refresh the page."}]
         return
-    # Detect language and translate if needed
     original_lang = detect_language(message)
     original_message = message
     needs_translation = original_lang != "en"
@@ -503,26 +772,27 @@ def stream_chat(
         message = translate_text(message, target_lang="en", source_lang=original_lang)
         logger.info(f"Translated query: {message}")
-    user_id = request.session_hash
-    index_dir = f"./{user_id}_index"
     # Initialize medical model
     medical_model_obj, medical_tokenizer = initialize_medical_model(medical_model)
-    # Adjust system prompt based on RAG setting
-    if use_rag:
-        if not os.path.exists(index_dir):
             yield history + [{"role": "assistant", "content": "Please upload documents first to use RAG."}]
             return
-        base_system_prompt = system_prompt if system_prompt else "As a medical specialist, provide detailed and accurate answers based on the provided medical documents."
     else:
         base_system_prompt = "As a medical specialist, provide short and concise clinical answers. Be brief and avoid lengthy explanations. Focus on key medical facts only."
-    # Get RAG context if enabled
     rag_context = ""
     source_info = ""
-    if use_rag and os.path.exists(index_dir):
         embed_model = HuggingFaceEmbedding(model_name=EMBEDDING_MODEL, token=HF_TOKEN)
         Settings.embed_model = embed_model
         storage_context = StorageContext.from_defaults(persist_dir=index_dir)
@@ -550,15 +820,15 @@ def stream_chat(
         if merged_file_sources:
             source_info = "\n\nRetrieved information from files: " + ", ".join(merged_file_sources.keys())
-    # Get web search context if enabled
     web_context = ""
     web_sources = []
-    if use_web_search:
-        logger.info("Performing web search...")
         web_results = search_web(message, max_results=5)
         if web_results:
             web_summary = summarize_web_content(web_results, message)
-            web_context = f"\n\nAdditional Web Sources:\n{web_summary}"
             web_sources = [r['title'] for r in web_results[:3]]
             logger.info(f"Web search completed, found {len(web_results)} results")
@@ -572,7 +842,7 @@ def stream_chat(
     full_context = "\n\n".join(context_parts) if context_parts else ""
     # Build system prompt
-    if use_rag or use_web_search:
         formatted_system_prompt = f"{base_system_prompt}\n\n{full_context}{source_info}"
     else:
         formatted_system_prompt = base_system_prompt
@@ -676,6 +946,24 @@ def stream_chat(
             updated_history[-1]["content"] = partial_response
             yield updated_history
         # Translate back if needed
         if needs_translation and partial_response:
             logger.info(f"Translating response back to {original_lang}...")

 import torch
 import threading
 import time
+import json
 from transformers import (
     AutoModelForCausalLM,
     AutoTokenizer,
         }
     )
+def autonomous_reasoning(query: str, history: list) -> dict:
+    """
+    Autonomous reasoning: Analyze query complexity, intent, and information needs.
+    Returns reasoning analysis with query type, complexity, and required information sources.
+    """
+    global global_translation_model, global_translation_tokenizer
+    if global_translation_model is None or global_translation_tokenizer is None:
+        initialize_translation_model()
+    reasoning_prompt = f"""Analyze this medical query and provide structured reasoning:
+Query: "{query}"
+Analyze:
+1. Query Type: (diagnosis, treatment, drug_info, symptom_analysis, research, general_info)
+2. Complexity: (simple, moderate, complex, multi_faceted)
+3. Information Needs: What specific information is required?
+4. Requires RAG: (yes/no) - Does this need document context?
+5. Requires Web Search: (yes/no) - Does this need current/updated information?
+6. Sub-questions: Break down into key sub-questions if complex
+Respond in JSON format:
+{{
+    "query_type": "...",
+    "complexity": "...",
+    "information_needs": ["..."],
+    "requires_rag": true/false,
+    "requires_web_search": true/false,
+    "sub_questions": ["..."]
+}}"""
+    messages = [
+        {"role": "system", "content": "You are a medical reasoning system. Analyze queries systematically and provide structured JSON responses."},
+        {"role": "user", "content": reasoning_prompt}
+    ]
+    prompt_text = global_translation_tokenizer.apply_chat_template(
+        messages,
+        tokenize=False,
+        add_generation_prompt=True
+    )
+    inputs = global_translation_tokenizer(prompt_text, return_tensors="pt").to(global_translation_model.device)
+    with torch.no_grad():
+        outputs = global_translation_model.generate(
+            **inputs,
+            max_new_tokens=512,
+            temperature=0.3,
+            do_sample=True,
+            pad_token_id=global_translation_tokenizer.eos_token_id
+        )
+    response = global_translation_tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
+    # Parse JSON response (with fallback)
+    try:
+        # Extract JSON from response
+        json_start = response.find('{')
+        json_end = response.rfind('}') + 1
+        if json_start >= 0 and json_end > json_start:
+            reasoning = json.loads(response[json_start:json_end])
+        else:
+            raise ValueError("No JSON found")
+    except:
+        # Fallback reasoning
+        reasoning = {
+            "query_type": "general_info",
+            "complexity": "moderate",
+            "information_needs": ["medical information"],
+            "requires_rag": True,
+            "requires_web_search": False,
+            "sub_questions": [query]
+        }
+    logger.info(f"Reasoning analysis: {reasoning}")
+    return reasoning
+def create_execution_plan(reasoning: dict, query: str, has_rag_index: bool) -> dict:
+    """
+    Planning: Create multi-step execution plan based on reasoning analysis.
+    Returns execution plan with steps and strategy.
+    """
+    plan = {
+        "steps": [],
+        "strategy": "sequential",
+        "iterations": 1
+    }
+    # Determine execution strategy
+    if reasoning["complexity"] in ["complex", "multi_faceted"]:
+        plan["strategy"] = "iterative"
+        plan["iterations"] = 2
+    # Step 1: Language detection and translation
+    plan["steps"].append({
+        "step": 1,
+        "action": "detect_language",
+        "description": "Detect query language and translate if needed"
+    })
+    # Step 2: RAG retrieval (if needed and available)
+    if reasoning.get("requires_rag", True) and has_rag_index:
+        plan["steps"].append({
+            "step": 2,
+            "action": "rag_retrieval",
+            "description": "Retrieve relevant document context",
+            "parameters": {"top_k": 15, "merge_threshold": 0.5}
+        })
+    # Step 3: Web search (if needed)
+    if reasoning.get("requires_web_search", False):
+        plan["steps"].append({
+            "step": 3,
+            "action": "web_search",
+            "description": "Search web for current/updated information",
+            "parameters": {"max_results": 5}
+        })
+    # Step 4: Sub-question processing (if complex)
+    if reasoning.get("sub_questions") and len(reasoning["sub_questions"]) > 1:
+        plan["steps"].append({
+            "step": 4,
+            "action": "multi_step_reasoning",
+            "description": "Process sub-questions iteratively",
+            "sub_questions": reasoning["sub_questions"]
+        })
+    # Step 5: Synthesis and answer generation
+    plan["steps"].append({
+        "step": len(plan["steps"]) + 1,
+        "action": "synthesize_answer",
+        "description": "Generate comprehensive answer from all sources"
+    })
+    # Step 6: Self-reflection (for complex queries)
+    if reasoning["complexity"] in ["complex", "multi_faceted"]:
+        plan["steps"].append({
+            "step": len(plan["steps"]) + 1,
+            "action": "self_reflection",
+            "description": "Evaluate answer quality and completeness"
+        })
+    logger.info(f"Execution plan created: {len(plan['steps'])} steps")
+    return plan
+def autonomous_execution_strategy(reasoning: dict, plan: dict, use_rag: bool, use_web_search: bool) -> dict:
+    """
+    Autonomous execution: Make decisions on information gathering strategy.
+    Overrides user settings if reasoning suggests better approach.
+    """
+    strategy = {
+        "use_rag": use_rag,
+        "use_web_search": use_web_search,
+        "reasoning_override": False,
+        "rationale": ""
+    }
+    # Autonomous decision: Override if reasoning suggests different approach
+    if reasoning.get("requires_rag", False) and not use_rag:
+        strategy["use_rag"] = True
+        strategy["reasoning_override"] = True
+        strategy["rationale"] += "Reasoning suggests RAG is needed for this query. "
+    if reasoning.get("requires_web_search", False) and not use_web_search:
+        strategy["use_web_search"] = True
+        strategy["reasoning_override"] = True
+        strategy["rationale"] += "Reasoning suggests web search for current information. "
+    if strategy["reasoning_override"]:
+        logger.info(f"Autonomous override: {strategy['rationale']}")
+    return strategy
+def self_reflection(answer: str, query: str, reasoning: dict) -> dict:
+    """
+    Self-reflection: Evaluate answer quality and completeness.
+    Returns reflection with quality score and improvement suggestions.
+    """
+    global global_translation_model, global_translation_tokenizer
+    if global_translation_model is None or global_translation_tokenizer is None:
+        initialize_translation_model()
+    reflection_prompt = f"""Evaluate this medical answer for quality and completeness:
+Query: "{query}"
+Answer: "{answer[:1000]}"
+Evaluate:
+1. Completeness: Does it address all aspects of the query?
+2. Accuracy: Is the medical information accurate?
+3. Clarity: Is it clear and well-structured?
+4. Sources: Are sources cited appropriately?
+5. Missing Information: What important information might be missing?
+Respond in JSON:
+{{
+    "completeness_score": 0-10,
+    "accuracy_score": 0-10,
+    "clarity_score": 0-10,
+    "overall_score": 0-10,
+    "missing_aspects": ["..."],
+    "improvement_suggestions": ["..."]
+}}"""
+    messages = [
+        {"role": "system", "content": "You are a medical answer quality evaluator. Provide honest, constructive feedback."},
+        {"role": "user", "content": reflection_prompt}
+    ]
+    prompt_text = global_translation_tokenizer.apply_chat_template(
+        messages,
+        tokenize=False,
+        add_generation_prompt=True
+    )
+    inputs = global_translation_tokenizer(prompt_text, return_tensors="pt").to(global_translation_model.device)
+    with torch.no_grad():
+        outputs = global_translation_model.generate(
+            **inputs,
+            max_new_tokens=256,
+            temperature=0.3,
+            do_sample=True,
+            pad_token_id=global_translation_tokenizer.eos_token_id
+        )
+    response = global_translation_tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
+    import json
+    try:
+        json_start = response.find('{')
+        json_end = response.rfind('}') + 1
+        if json_start >= 0 and json_end > json_start:
+            reflection = json.loads(response[json_start:json_end])
+        else:
+            reflection = {"overall_score": 7, "improvement_suggestions": []}
+    except:
+        reflection = {"overall_score": 7, "improvement_suggestions": []}
+    logger.info(f"Self-reflection score: {reflection.get('overall_score', 'N/A')}")
+    return reflection
 def extract_text_from_document(file):
     file_name = file.name
     file_extension = os.path.splitext(file_name)[1].lower()
         yield history + [{"role": "assistant", "content": "Session initialization failed. Please refresh the page."}]
         return
+    user_id = request.session_hash
+    index_dir = f"./{user_id}_index"
+    has_rag_index = os.path.exists(index_dir)
+    # ===== AUTONOMOUS REASONING =====
+    logger.info("🤔 Starting autonomous reasoning...")
+    reasoning = autonomous_reasoning(message, history)
+    # ===== PLANNING =====
+    logger.info("📋 Creating execution plan...")
+    plan = create_execution_plan(reasoning, message, has_rag_index)
+    # ===== AUTONOMOUS EXECUTION STRATEGY =====
+    logger.info("🎯 Determining execution strategy...")
+    execution_strategy = autonomous_execution_strategy(reasoning, plan, use_rag, use_web_search)
+    # Use autonomous strategy decisions
+    final_use_rag = execution_strategy["use_rag"]
+    final_use_web_search = execution_strategy["use_web_search"]
+    # Show reasoning override message if applicable
+    reasoning_note = ""
+    if execution_strategy["reasoning_override"]:
+        reasoning_note = f"\n\n💡 *Autonomous Reasoning: {execution_strategy['rationale']}*"
+    # Detect language and translate if needed (Step 1 of plan)
     original_lang = detect_language(message)
     original_message = message
     needs_translation = original_lang != "en"
         message = translate_text(message, target_lang="en", source_lang=original_lang)
         logger.info(f"Translated query: {message}")
     # Initialize medical model
     medical_model_obj, medical_tokenizer = initialize_medical_model(medical_model)
+    # Adjust system prompt based on RAG setting and reasoning
+    if final_use_rag:
+        if not has_rag_index:
             yield history + [{"role": "assistant", "content": "Please upload documents first to use RAG."}]
             return
+        base_system_prompt = system_prompt if system_prompt else "As a medical specialist, provide clinical and concise answers based on the provided medical documents and context."
     else:
         base_system_prompt = "As a medical specialist, provide short and concise clinical answers. Be brief and avoid lengthy explanations. Focus on key medical facts only."
+    # Add reasoning context to system prompt for complex queries
+    if reasoning["complexity"] in ["complex", "multi_faceted"]:
+        base_system_prompt += f"\n\nQuery Analysis: This is a {reasoning['complexity']} {reasoning['query_type']} query. Address all sub-questions: {', '.join(reasoning.get('sub_questions', [])[:3])}"
+    # ===== EXECUTION: RAG Retrieval (Step 2) =====
     rag_context = ""
     source_info = ""
+    if final_use_rag and has_rag_index:
         embed_model = HuggingFaceEmbedding(model_name=EMBEDDING_MODEL, token=HF_TOKEN)
         Settings.embed_model = embed_model
         storage_context = StorageContext.from_defaults(persist_dir=index_dir)
         if merged_file_sources:
             source_info = "\n\nRetrieved information from files: " + ", ".join(merged_file_sources.keys())
+    # ===== EXECUTION: Web Search (Step 3) =====
     web_context = ""
     web_sources = []
+    if final_use_web_search:
+        logger.info("🌐 Performing web search (MCP)...")
         web_results = search_web(message, max_results=5)
         if web_results:
             web_summary = summarize_web_content(web_results, message)
+            web_context = f"\n\nAdditional Web Sources (MCP):\n{web_summary}"
             web_sources = [r['title'] for r in web_results[:3]]
             logger.info(f"Web search completed, found {len(web_results)} results")
     full_context = "\n\n".join(context_parts) if context_parts else ""
     # Build system prompt
+    if final_use_rag or final_use_web_search:
         formatted_system_prompt = f"{base_system_prompt}\n\n{full_context}{source_info}"
     else:
         formatted_system_prompt = base_system_prompt
             updated_history[-1]["content"] = partial_response
             yield updated_history
+        # ===== SELF-REFLECTION (Step 6) =====
+        if reasoning["complexity"] in ["complex", "multi_faceted"]:
+            logger.info("🔍 Performing self-reflection on answer quality...")
+            reflection = self_reflection(partial_response, message, reasoning)
+            # Add reflection note if score is low or improvements suggested
+            if reflection.get("overall_score", 10) < 7 or reflection.get("improvement_suggestions"):
+                reflection_note = f"\n\n---\n**Self-Reflection** (Score: {reflection.get('overall_score', 'N/A')}/10)"
+                if reflection.get("improvement_suggestions"):
+                    reflection_note += f"\n💡 Suggestions: {', '.join(reflection['improvement_suggestions'][:2])}"
+                partial_response += reflection_note
+                updated_history[-1]["content"] = partial_response
+        # Add reasoning note if autonomous override occurred
+        if reasoning_note:
+            partial_response = reasoning_note + "\n\n" + partial_response
+            updated_history[-1]["content"] = partial_response
         # Translate back if needed
         if needs_translation and partial_response:
             logger.info(f"Translating response back to {original_lang}...")