LiamKhoaLe commited on
Commit
7415155
·
1 Parent(s): 109bcee

Upd autonomous reasoning, planning, and execution.

Browse files
Files changed (2) hide show
  1. README.md +110 -2
  2. app.py +303 -15
README.md CHANGED
@@ -90,8 +90,116 @@ See `requirements.txt` for full dependency list. Key dependencies:
90
  - Multi-language medical consultations
91
  - Evidence-based medical answers
92
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
93
  ---
94
 
95
- **Note**: This system is designed to assist with medical information retrieval. Always consult qualified healthcare professionals for medical decisions.
96
 
97
- > Introduction: A medical app for MCP-1st-Birthday hackathon, integrate MCP searcher and document RAG
 
90
  - Multi-language medical consultations
91
  - Evidence-based medical answers
92
 
93
+ ## 🏥 Enterprise-Level Clinical Decision Support
94
+
95
+ ### **Empowering Medical Specialists with AI-Powered Decision Support**
96
+
97
+ MedLLM Agent is designed to support **doctors, clinicians, and medical specialists** in making informed clinical decisions by leveraging the power of Large Language Models (LLMs) and Model Context Protocol (MCP). This system transforms how medical professionals access, analyze, and synthesize medical information in real-time.
98
+
99
+ ### **Key Enterprise Capabilities**
100
+
101
+ #### 1. **Autonomous Reasoning & Planning**
102
+ - **Intelligent Query Analysis**: The system autonomously analyzes medical queries to understand:
103
+ - Query type (diagnosis, treatment, drug information, symptom analysis)
104
+ - Complexity level (simple, moderate, complex, multi-faceted)
105
+ - Information requirements and data sources needed
106
+
107
+ - **Multi-Step Execution Planning**: For complex clinical questions, the system:
108
+ - Breaks down queries into sub-questions
109
+ - Creates structured execution plans
110
+ - Determines optimal information gathering strategies
111
+ - Adapts approach based on query complexity
112
+
113
+ #### 2. **Autonomous Decision-Making**
114
+ - **Smart Resource Selection**: The system autonomously decides:
115
+ - When to use document RAG vs. web search
116
+ - When both sources are needed for comprehensive answers
117
+ - Optimal retrieval parameters based on query characteristics
118
+
119
+ - **Context-Aware Execution**: Automatically:
120
+ - Overrides user settings when reasoning suggests better approaches
121
+ - Combines multiple information sources intelligently
122
+ - Prioritizes evidence-based medical sources
123
+
124
+ #### 3. **Self-Reflection & Quality Assurance**
125
+ - **Answer Quality Evaluation**: For complex queries, the system:
126
+ - Self-evaluates answer completeness and accuracy
127
+ - Identifies missing information or aspects
128
+ - Provides improvement suggestions
129
+ - Ensures high-quality clinical responses
130
+
131
+ ### **Enterprise Use Cases for Medical Specialists**
132
+
133
+ #### **Clinical Decision Support**
134
+ - **Diagnostic Assistance**: Upload patient records, lab results, and medical histories. Ask complex diagnostic questions and receive evidence-based answers grounded in your documents and current medical literature.
135
+
136
+ - **Treatment Planning**: Query treatment protocols, drug interactions, and therapeutic guidelines. The system autonomously retrieves relevant information from your clinical documents and current medical databases.
137
+
138
+ - **Drug Information & Interactions**: Get comprehensive drug information, contraindications, and interaction analyses by combining your formulary documents with up-to-date web sources.
139
+
140
+ #### **Research & Evidence Synthesis**
141
+ - **Literature Review Support**: Upload research papers, clinical trials, and medical literature. The system helps synthesize findings, identify connections, and answer research questions.
142
+
143
+ - **Clinical Guideline Analysis**: Compare and analyze multiple clinical guidelines, protocols, and best practices from your document library.
144
+
145
+ #### **Multi-Language Clinical Support**
146
+ - **International Patient Care**: Handle queries in multiple languages. The system automatically translates, processes with medical models, and translates responses back—enabling care for diverse patient populations.
147
+
148
+ #### **Real-Time Information Access**
149
+ - **Current Medical Knowledge**: Leverage MCP web search to access:
150
+ - Latest treatment protocols
151
+ - Recent clinical trial results
152
+ - Updated drug information
153
+ - Current medical guidelines
154
+
155
+ ### **How It Works: Autonomous Reasoning in Action**
156
+
157
+ 1. **Query Analysis** → System analyzes: "What are the treatment options for Type 2 diabetes in elderly patients with renal impairment?"
158
+ - Identifies as complex, multi-faceted query
159
+ - Determines need for both RAG (patient records) and web search (current guidelines)
160
+ - Breaks into sub-questions: treatment options, age considerations, renal function impact
161
+
162
+ 2. **Autonomous Planning** → Creates execution plan:
163
+ - Step 1: Language detection/translation
164
+ - Step 2: RAG retrieval from patient documents
165
+ - Step 3: Web search for current diabetes treatment guidelines
166
+ - Step 4: Multi-step reasoning for each sub-question
167
+ - Step 5: Synthesis of comprehensive answer
168
+ - Step 6: Self-reflection on answer quality
169
+
170
+ 3. **Autonomous Execution** → System executes plan:
171
+ - Retrieves relevant patient history from documents
172
+ - Searches web for latest ADA/ADA-EASD guidelines
173
+ - Synthesizes information considering age and renal function
174
+ - Generates evidence-based treatment recommendations
175
+
176
+ 4. **Self-Reflection** → Evaluates answer:
177
+ - Checks completeness (all sub-questions addressed?)
178
+ - Verifies accuracy (evidence-based?)
179
+ - Suggests improvements if needed
180
+
181
+ ### **Enterprise Benefits**
182
+
183
+ ✅ **Time Efficiency**: Reduces time spent searching through documents and medical databases
184
+ ✅ **Evidence-Based Decisions**: Grounds answers in uploaded documents and current medical literature
185
+ ✅ **Reduced Hallucination**: RAG ensures answers are based on actual documents and verified sources
186
+ ✅ **Comprehensive Coverage**: Combines institutional knowledge (documents) with current medical knowledge (web)
187
+ ✅ **Quality Assurance**: Self-reflection ensures high-quality, complete answers
188
+ ✅ **Scalability**: Handles multiple languages, complex queries, and large document libraries
189
+ ✅ **Clinical Workflow Integration**: Designed to fit into existing clinical decision-making processes
190
+
191
+ ### **Implementation in Clinical Settings**
192
+
193
+ **Hospital Systems**: Deploy for clinical decision support, integrating with EMR systems and institutional medical libraries.
194
+
195
+ **Specialty Clinics**: Customize for specific medical specialties by uploading specialty-specific documents and guidelines.
196
+
197
+ **Medical Education**: Support medical training and education with comprehensive, evidence-based answers.
198
+
199
+ **Research Institutions**: Accelerate medical research by synthesizing information from multiple sources.
200
+
201
  ---
202
 
203
+ **Note**: This system is designed to **assist** medical professionals with information retrieval and synthesis. It does not replace clinical judgment. All medical decisions should be made by qualified healthcare professionals who consider the full clinical context, patient-specific factors, and their professional expertise.
204
 
205
+ > Introduction: A medical app for MCP-1st-Birthday hackathon, integrating MCP searcher and document RAG with autonomous reasoning, planning, and execution capabilities for enterprise-level clinical decision support.
app.py CHANGED
@@ -5,6 +5,7 @@ import logging
5
  import torch
6
  import threading
7
  import time
 
8
  from transformers import (
9
  AutoModelForCausalLM,
10
  AutoTokenizer,
@@ -351,6 +352,249 @@ def get_llm_for_rag(temperature=0.7, max_new_tokens=256, top_p=0.95, top_k=50):
351
  }
352
  )
353
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
354
  def extract_text_from_document(file):
355
  file_name = file.name
356
  file_extension = os.path.splitext(file_name)[1].lower()
@@ -493,7 +737,32 @@ def stream_chat(
493
  yield history + [{"role": "assistant", "content": "Session initialization failed. Please refresh the page."}]
494
  return
495
 
496
- # Detect language and translate if needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
497
  original_lang = detect_language(message)
498
  original_message = message
499
  needs_translation = original_lang != "en"
@@ -503,26 +772,27 @@ def stream_chat(
503
  message = translate_text(message, target_lang="en", source_lang=original_lang)
504
  logger.info(f"Translated query: {message}")
505
 
506
- user_id = request.session_hash
507
- index_dir = f"./{user_id}_index"
508
-
509
  # Initialize medical model
510
  medical_model_obj, medical_tokenizer = initialize_medical_model(medical_model)
511
 
512
- # Adjust system prompt based on RAG setting
513
- if use_rag:
514
- if not os.path.exists(index_dir):
515
  yield history + [{"role": "assistant", "content": "Please upload documents first to use RAG."}]
516
  return
517
 
518
- base_system_prompt = system_prompt if system_prompt else "As a medical specialist, provide detailed and accurate answers based on the provided medical documents."
519
  else:
520
  base_system_prompt = "As a medical specialist, provide short and concise clinical answers. Be brief and avoid lengthy explanations. Focus on key medical facts only."
521
 
522
- # Get RAG context if enabled
 
 
 
 
523
  rag_context = ""
524
  source_info = ""
525
- if use_rag and os.path.exists(index_dir):
526
  embed_model = HuggingFaceEmbedding(model_name=EMBEDDING_MODEL, token=HF_TOKEN)
527
  Settings.embed_model = embed_model
528
  storage_context = StorageContext.from_defaults(persist_dir=index_dir)
@@ -550,15 +820,15 @@ def stream_chat(
550
  if merged_file_sources:
551
  source_info = "\n\nRetrieved information from files: " + ", ".join(merged_file_sources.keys())
552
 
553
- # Get web search context if enabled
554
  web_context = ""
555
  web_sources = []
556
- if use_web_search:
557
- logger.info("Performing web search...")
558
  web_results = search_web(message, max_results=5)
559
  if web_results:
560
  web_summary = summarize_web_content(web_results, message)
561
- web_context = f"\n\nAdditional Web Sources:\n{web_summary}"
562
  web_sources = [r['title'] for r in web_results[:3]]
563
  logger.info(f"Web search completed, found {len(web_results)} results")
564
 
@@ -572,7 +842,7 @@ def stream_chat(
572
  full_context = "\n\n".join(context_parts) if context_parts else ""
573
 
574
  # Build system prompt
575
- if use_rag or use_web_search:
576
  formatted_system_prompt = f"{base_system_prompt}\n\n{full_context}{source_info}"
577
  else:
578
  formatted_system_prompt = base_system_prompt
@@ -676,6 +946,24 @@ def stream_chat(
676
  updated_history[-1]["content"] = partial_response
677
  yield updated_history
678
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
679
  # Translate back if needed
680
  if needs_translation and partial_response:
681
  logger.info(f"Translating response back to {original_lang}...")
 
5
  import torch
6
  import threading
7
  import time
8
+ import json
9
  from transformers import (
10
  AutoModelForCausalLM,
11
  AutoTokenizer,
 
352
  }
353
  )
354
 
355
+ def autonomous_reasoning(query: str, history: list) -> dict:
356
+ """
357
+ Autonomous reasoning: Analyze query complexity, intent, and information needs.
358
+ Returns reasoning analysis with query type, complexity, and required information sources.
359
+ """
360
+ global global_translation_model, global_translation_tokenizer
361
+ if global_translation_model is None or global_translation_tokenizer is None:
362
+ initialize_translation_model()
363
+
364
+ reasoning_prompt = f"""Analyze this medical query and provide structured reasoning:
365
+
366
+ Query: "{query}"
367
+
368
+ Analyze:
369
+ 1. Query Type: (diagnosis, treatment, drug_info, symptom_analysis, research, general_info)
370
+ 2. Complexity: (simple, moderate, complex, multi_faceted)
371
+ 3. Information Needs: What specific information is required?
372
+ 4. Requires RAG: (yes/no) - Does this need document context?
373
+ 5. Requires Web Search: (yes/no) - Does this need current/updated information?
374
+ 6. Sub-questions: Break down into key sub-questions if complex
375
+
376
+ Respond in JSON format:
377
+ {{
378
+ "query_type": "...",
379
+ "complexity": "...",
380
+ "information_needs": ["..."],
381
+ "requires_rag": true/false,
382
+ "requires_web_search": true/false,
383
+ "sub_questions": ["..."]
384
+ }}"""
385
+
386
+ messages = [
387
+ {"role": "system", "content": "You are a medical reasoning system. Analyze queries systematically and provide structured JSON responses."},
388
+ {"role": "user", "content": reasoning_prompt}
389
+ ]
390
+
391
+ prompt_text = global_translation_tokenizer.apply_chat_template(
392
+ messages,
393
+ tokenize=False,
394
+ add_generation_prompt=True
395
+ )
396
+
397
+ inputs = global_translation_tokenizer(prompt_text, return_tensors="pt").to(global_translation_model.device)
398
+
399
+ with torch.no_grad():
400
+ outputs = global_translation_model.generate(
401
+ **inputs,
402
+ max_new_tokens=512,
403
+ temperature=0.3,
404
+ do_sample=True,
405
+ pad_token_id=global_translation_tokenizer.eos_token_id
406
+ )
407
+
408
+ response = global_translation_tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
409
+
410
+ # Parse JSON response (with fallback)
411
+ try:
412
+ # Extract JSON from response
413
+ json_start = response.find('{')
414
+ json_end = response.rfind('}') + 1
415
+ if json_start >= 0 and json_end > json_start:
416
+ reasoning = json.loads(response[json_start:json_end])
417
+ else:
418
+ raise ValueError("No JSON found")
419
+ except:
420
+ # Fallback reasoning
421
+ reasoning = {
422
+ "query_type": "general_info",
423
+ "complexity": "moderate",
424
+ "information_needs": ["medical information"],
425
+ "requires_rag": True,
426
+ "requires_web_search": False,
427
+ "sub_questions": [query]
428
+ }
429
+
430
+ logger.info(f"Reasoning analysis: {reasoning}")
431
+ return reasoning
432
+
433
+ def create_execution_plan(reasoning: dict, query: str, has_rag_index: bool) -> dict:
434
+ """
435
+ Planning: Create multi-step execution plan based on reasoning analysis.
436
+ Returns execution plan with steps and strategy.
437
+ """
438
+ plan = {
439
+ "steps": [],
440
+ "strategy": "sequential",
441
+ "iterations": 1
442
+ }
443
+
444
+ # Determine execution strategy
445
+ if reasoning["complexity"] in ["complex", "multi_faceted"]:
446
+ plan["strategy"] = "iterative"
447
+ plan["iterations"] = 2
448
+
449
+ # Step 1: Language detection and translation
450
+ plan["steps"].append({
451
+ "step": 1,
452
+ "action": "detect_language",
453
+ "description": "Detect query language and translate if needed"
454
+ })
455
+
456
+ # Step 2: RAG retrieval (if needed and available)
457
+ if reasoning.get("requires_rag", True) and has_rag_index:
458
+ plan["steps"].append({
459
+ "step": 2,
460
+ "action": "rag_retrieval",
461
+ "description": "Retrieve relevant document context",
462
+ "parameters": {"top_k": 15, "merge_threshold": 0.5}
463
+ })
464
+
465
+ # Step 3: Web search (if needed)
466
+ if reasoning.get("requires_web_search", False):
467
+ plan["steps"].append({
468
+ "step": 3,
469
+ "action": "web_search",
470
+ "description": "Search web for current/updated information",
471
+ "parameters": {"max_results": 5}
472
+ })
473
+
474
+ # Step 4: Sub-question processing (if complex)
475
+ if reasoning.get("sub_questions") and len(reasoning["sub_questions"]) > 1:
476
+ plan["steps"].append({
477
+ "step": 4,
478
+ "action": "multi_step_reasoning",
479
+ "description": "Process sub-questions iteratively",
480
+ "sub_questions": reasoning["sub_questions"]
481
+ })
482
+
483
+ # Step 5: Synthesis and answer generation
484
+ plan["steps"].append({
485
+ "step": len(plan["steps"]) + 1,
486
+ "action": "synthesize_answer",
487
+ "description": "Generate comprehensive answer from all sources"
488
+ })
489
+
490
+ # Step 6: Self-reflection (for complex queries)
491
+ if reasoning["complexity"] in ["complex", "multi_faceted"]:
492
+ plan["steps"].append({
493
+ "step": len(plan["steps"]) + 1,
494
+ "action": "self_reflection",
495
+ "description": "Evaluate answer quality and completeness"
496
+ })
497
+
498
+ logger.info(f"Execution plan created: {len(plan['steps'])} steps")
499
+ return plan
500
+
501
+ def autonomous_execution_strategy(reasoning: dict, plan: dict, use_rag: bool, use_web_search: bool) -> dict:
502
+ """
503
+ Autonomous execution: Make decisions on information gathering strategy.
504
+ Overrides user settings if reasoning suggests better approach.
505
+ """
506
+ strategy = {
507
+ "use_rag": use_rag,
508
+ "use_web_search": use_web_search,
509
+ "reasoning_override": False,
510
+ "rationale": ""
511
+ }
512
+
513
+ # Autonomous decision: Override if reasoning suggests different approach
514
+ if reasoning.get("requires_rag", False) and not use_rag:
515
+ strategy["use_rag"] = True
516
+ strategy["reasoning_override"] = True
517
+ strategy["rationale"] += "Reasoning suggests RAG is needed for this query. "
518
+
519
+ if reasoning.get("requires_web_search", False) and not use_web_search:
520
+ strategy["use_web_search"] = True
521
+ strategy["reasoning_override"] = True
522
+ strategy["rationale"] += "Reasoning suggests web search for current information. "
523
+
524
+ if strategy["reasoning_override"]:
525
+ logger.info(f"Autonomous override: {strategy['rationale']}")
526
+
527
+ return strategy
528
+
529
+ def self_reflection(answer: str, query: str, reasoning: dict) -> dict:
530
+ """
531
+ Self-reflection: Evaluate answer quality and completeness.
532
+ Returns reflection with quality score and improvement suggestions.
533
+ """
534
+ global global_translation_model, global_translation_tokenizer
535
+ if global_translation_model is None or global_translation_tokenizer is None:
536
+ initialize_translation_model()
537
+
538
+ reflection_prompt = f"""Evaluate this medical answer for quality and completeness:
539
+
540
+ Query: "{query}"
541
+ Answer: "{answer[:1000]}"
542
+
543
+ Evaluate:
544
+ 1. Completeness: Does it address all aspects of the query?
545
+ 2. Accuracy: Is the medical information accurate?
546
+ 3. Clarity: Is it clear and well-structured?
547
+ 4. Sources: Are sources cited appropriately?
548
+ 5. Missing Information: What important information might be missing?
549
+
550
+ Respond in JSON:
551
+ {{
552
+ "completeness_score": 0-10,
553
+ "accuracy_score": 0-10,
554
+ "clarity_score": 0-10,
555
+ "overall_score": 0-10,
556
+ "missing_aspects": ["..."],
557
+ "improvement_suggestions": ["..."]
558
+ }}"""
559
+
560
+ messages = [
561
+ {"role": "system", "content": "You are a medical answer quality evaluator. Provide honest, constructive feedback."},
562
+ {"role": "user", "content": reflection_prompt}
563
+ ]
564
+
565
+ prompt_text = global_translation_tokenizer.apply_chat_template(
566
+ messages,
567
+ tokenize=False,
568
+ add_generation_prompt=True
569
+ )
570
+
571
+ inputs = global_translation_tokenizer(prompt_text, return_tensors="pt").to(global_translation_model.device)
572
+
573
+ with torch.no_grad():
574
+ outputs = global_translation_model.generate(
575
+ **inputs,
576
+ max_new_tokens=256,
577
+ temperature=0.3,
578
+ do_sample=True,
579
+ pad_token_id=global_translation_tokenizer.eos_token_id
580
+ )
581
+
582
+ response = global_translation_tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
583
+
584
+ import json
585
+ try:
586
+ json_start = response.find('{')
587
+ json_end = response.rfind('}') + 1
588
+ if json_start >= 0 and json_end > json_start:
589
+ reflection = json.loads(response[json_start:json_end])
590
+ else:
591
+ reflection = {"overall_score": 7, "improvement_suggestions": []}
592
+ except:
593
+ reflection = {"overall_score": 7, "improvement_suggestions": []}
594
+
595
+ logger.info(f"Self-reflection score: {reflection.get('overall_score', 'N/A')}")
596
+ return reflection
597
+
598
  def extract_text_from_document(file):
599
  file_name = file.name
600
  file_extension = os.path.splitext(file_name)[1].lower()
 
737
  yield history + [{"role": "assistant", "content": "Session initialization failed. Please refresh the page."}]
738
  return
739
 
740
+ user_id = request.session_hash
741
+ index_dir = f"./{user_id}_index"
742
+ has_rag_index = os.path.exists(index_dir)
743
+
744
+ # ===== AUTONOMOUS REASONING =====
745
+ logger.info("🤔 Starting autonomous reasoning...")
746
+ reasoning = autonomous_reasoning(message, history)
747
+
748
+ # ===== PLANNING =====
749
+ logger.info("📋 Creating execution plan...")
750
+ plan = create_execution_plan(reasoning, message, has_rag_index)
751
+
752
+ # ===== AUTONOMOUS EXECUTION STRATEGY =====
753
+ logger.info("🎯 Determining execution strategy...")
754
+ execution_strategy = autonomous_execution_strategy(reasoning, plan, use_rag, use_web_search)
755
+
756
+ # Use autonomous strategy decisions
757
+ final_use_rag = execution_strategy["use_rag"]
758
+ final_use_web_search = execution_strategy["use_web_search"]
759
+
760
+ # Show reasoning override message if applicable
761
+ reasoning_note = ""
762
+ if execution_strategy["reasoning_override"]:
763
+ reasoning_note = f"\n\n💡 *Autonomous Reasoning: {execution_strategy['rationale']}*"
764
+
765
+ # Detect language and translate if needed (Step 1 of plan)
766
  original_lang = detect_language(message)
767
  original_message = message
768
  needs_translation = original_lang != "en"
 
772
  message = translate_text(message, target_lang="en", source_lang=original_lang)
773
  logger.info(f"Translated query: {message}")
774
 
 
 
 
775
  # Initialize medical model
776
  medical_model_obj, medical_tokenizer = initialize_medical_model(medical_model)
777
 
778
+ # Adjust system prompt based on RAG setting and reasoning
779
+ if final_use_rag:
780
+ if not has_rag_index:
781
  yield history + [{"role": "assistant", "content": "Please upload documents first to use RAG."}]
782
  return
783
 
784
+ base_system_prompt = system_prompt if system_prompt else "As a medical specialist, provide clinical and concise answers based on the provided medical documents and context."
785
  else:
786
  base_system_prompt = "As a medical specialist, provide short and concise clinical answers. Be brief and avoid lengthy explanations. Focus on key medical facts only."
787
 
788
+ # Add reasoning context to system prompt for complex queries
789
+ if reasoning["complexity"] in ["complex", "multi_faceted"]:
790
+ base_system_prompt += f"\n\nQuery Analysis: This is a {reasoning['complexity']} {reasoning['query_type']} query. Address all sub-questions: {', '.join(reasoning.get('sub_questions', [])[:3])}"
791
+
792
+ # ===== EXECUTION: RAG Retrieval (Step 2) =====
793
  rag_context = ""
794
  source_info = ""
795
+ if final_use_rag and has_rag_index:
796
  embed_model = HuggingFaceEmbedding(model_name=EMBEDDING_MODEL, token=HF_TOKEN)
797
  Settings.embed_model = embed_model
798
  storage_context = StorageContext.from_defaults(persist_dir=index_dir)
 
820
  if merged_file_sources:
821
  source_info = "\n\nRetrieved information from files: " + ", ".join(merged_file_sources.keys())
822
 
823
+ # ===== EXECUTION: Web Search (Step 3) =====
824
  web_context = ""
825
  web_sources = []
826
+ if final_use_web_search:
827
+ logger.info("🌐 Performing web search (MCP)...")
828
  web_results = search_web(message, max_results=5)
829
  if web_results:
830
  web_summary = summarize_web_content(web_results, message)
831
+ web_context = f"\n\nAdditional Web Sources (MCP):\n{web_summary}"
832
  web_sources = [r['title'] for r in web_results[:3]]
833
  logger.info(f"Web search completed, found {len(web_results)} results")
834
 
 
842
  full_context = "\n\n".join(context_parts) if context_parts else ""
843
 
844
  # Build system prompt
845
+ if final_use_rag or final_use_web_search:
846
  formatted_system_prompt = f"{base_system_prompt}\n\n{full_context}{source_info}"
847
  else:
848
  formatted_system_prompt = base_system_prompt
 
946
  updated_history[-1]["content"] = partial_response
947
  yield updated_history
948
 
949
+ # ===== SELF-REFLECTION (Step 6) =====
950
+ if reasoning["complexity"] in ["complex", "multi_faceted"]:
951
+ logger.info("🔍 Performing self-reflection on answer quality...")
952
+ reflection = self_reflection(partial_response, message, reasoning)
953
+
954
+ # Add reflection note if score is low or improvements suggested
955
+ if reflection.get("overall_score", 10) < 7 or reflection.get("improvement_suggestions"):
956
+ reflection_note = f"\n\n---\n**Self-Reflection** (Score: {reflection.get('overall_score', 'N/A')}/10)"
957
+ if reflection.get("improvement_suggestions"):
958
+ reflection_note += f"\n💡 Suggestions: {', '.join(reflection['improvement_suggestions'][:2])}"
959
+ partial_response += reflection_note
960
+ updated_history[-1]["content"] = partial_response
961
+
962
+ # Add reasoning note if autonomous override occurred
963
+ if reasoning_note:
964
+ partial_response = reasoning_note + "\n\n" + partial_response
965
+ updated_history[-1]["content"] = partial_response
966
+
967
  # Translate back if needed
968
  if needs_translation and partial_response:
969
  logger.info(f"Translating response back to {original_lang}...")