Spaces:

BinKhoaLe1812
/

Medical-Chatbot

Sleeping

App Files Files Community

LiamKhoaLe commited on Oct 8

Commit

b8bf5c8

1 Parent(s): 410be5e

Upd backend with full search service implementation. Refactor directory

Browse files

Files changed (29) hide show

.dockerignore +4 -0
Dockerfile +3 -3
README.md +120 -1
api/README.md +96 -0
api/__init__.py +2 -0
api/app.py +54 -0
api/chatbot.py +140 -0
api/config.py +70 -0
api/database.py +100 -0
app.py → api/legacy.py +17 -11
api/retrieval.py +100 -0
api/routes.py +63 -0
main.py +6 -0
memory/__init__.py +2 -0
memory.py → memory/memory.py +180 -275
models/__init__.py +3 -0
download_model.py → models/download_model.py +0 -0
llama_integration.py → models/llama.py +57 -55
models/summarizer.py +185 -0
warmup.py → models/warmup.py +0 -0
search/__init__.py +2 -0
search.py → search/search.py +24 -9
utils/__init__.py +4 -0
clear_mongo.py → utils/clear_mongo.py +0 -0
connect_mongo.py → utils/connect_mongo.py +0 -0
diagnosis.py → utils/diagnosis.py +0 -0
migrate.py → utils/migrate.py +0 -0
translation.py → utils/translation.py +0 -0
vlm.py → utils/vlm.py +0 -0

.dockerignore ADDED Viewed

	@@ -0,0 +1,4 @@

+api/legacy.py
+*.md
+.env
+*yml

Dockerfile CHANGED Viewed

@@ -24,7 +24,7 @@ RUN mkdir -p /app/model_cache /home/user/.cache/huggingface/sentence-transformer
     chown -R user:user /app/model_cache /home/user/.cache/huggingface
 # Pre-load model in a separate script
-RUN python /app/download_model.py && python /app/warmup.py
 # Ensure ownership and permissions remain intact
 RUN chown -R user:user /app/model_cache
@@ -32,5 +32,5 @@ RUN chown -R user:user /app/model_cache
 # Expose port
 EXPOSE 7860
-# Run the application
-CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860", "--workers", "1"]

     chown -R user:user /app/model_cache /home/user/.cache/huggingface
 # Pre-load model in a separate script
+RUN python /app/models/download_model.py && python /app/models/warmup.py
 # Ensure ownership and permissions remain intact
 RUN chown -R user:user /app/model_cache
 # Expose port
 EXPOSE 7860
+# Run the application using main.py as entry point
+CMD ["python", "main.py"]

README.md CHANGED Viewed

@@ -10,4 +10,123 @@ license: apache-2.0
 short_description: MedicalChatbot, FAISS, Gemini, MongoDB vDB, LRU
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 short_description: MedicalChatbot, FAISS, Gemini, MongoDB vDB, LRU
 ---
+# Medical Chatbot Backend
+## Project Structure
+The backend is organized into logical modules for better maintainability:
+### 📁 **api/**
+- **app.py** - Main FastAPI application with endpoints
+- **__init__.py** - API package initialization
+### 📁 **models/**
+- **llama.py** - NVIDIA Llama model integration for search processing
+- **summarizer.py** - Text summarization using NVIDIA Llama
+- **download_model.py** - Model download utilities
+- **warmup.py** - Model warmup scripts
+### 📁 **memory/**
+- **memory_updated.py** - Enhanced memory management with NVIDIA Llama summarization
+- **memory.py** - Legacy memory implementation
+### 📁 **search/**
+- **search.py** - Web search and content extraction functionality
+### 📁 **utils/**
+- **translation.py** - Multi-language translation utilities
+- **vlm.py** - Vision Language Model for medical image processing
+- **diagnosis.py** - Symptom-based diagnosis utilities
+- **connect_mongo.py** - MongoDB connection utilities
+- **clear_mongo.py** - Database cleanup utilities
+- **migrate.py** - Database migration scripts
+## Key Features
+### 🔍 **Search Integration**
+- Web search with up to 10 resources
+- NVIDIA Llama model for keyword generation and document summarization
+- Citation system with URL mapping
+- Smart content filtering and validation
+### 🧠 **Enhanced Memory Management**
+- NVIDIA Llama-powered summarization for all text processing
+- Optimized chunking and context retrieval
+- Smart deduplication and merging
+- Conversation continuity with concise summaries
+### 📝 **Summarization System**
+- **Text Cleaning**: Removes conversational fillers and normalizes text
+- **Key Phrase Extraction**: Identifies medical terms and concepts
+- **Concise Summaries**: Preserves key ideas without fluff
+- **NVIDIA Llama Integration**: All summarization uses NVIDIA model instead of Gemini
+## Usage
+### Running the Application
+```bash
+# Using main entry point
+python main.py
+# Or directly
+python api/app.py
+```
+### Environment Variables
+- `NVIDIA_URI` - NVIDIA API key for Llama model
+- `FlashAPI` - Gemini API key
+- `MONGO_URI` - MongoDB connection string
+- `INDEX_URI` - FAISS index database URI
+## API Endpoints
+### POST `/chat`
+Main chat endpoint with search mode support.
+**Request Body:**
+```json
+{
+  "query": "User's medical question",
+  "lang": "EN",
+  "search": true,
+  "user_id": "unique_user_id",
+  "image_base64": "optional_base64_image",
+  "img_desc": "image_description"
+}
+```
+**Response:**
+```json
+{
+  "response": "Medical response with citations <URL>",
+  "response_time": "2.34s"
+}
+```
+## Search Mode Features
+When `search: true`:
+1. **Web Search**: Fetches up to 10 relevant medical resources
+2. **Llama Processing**: Generates keywords and summarizes content
+3. **Citation System**: Replaces `<#ID>` tags with actual URLs
+4. **UI Integration**: Frontend displays magnifier icons for source links
+## Summarization Features
+All summarization tasks use NVIDIA Llama model:
+- **get_contextual_chunks**: Summarizes conversation history and RAG chunks
+- **chunk_response**: Chunks and summarizes bot responses
+- **summarize_documents**: Summarizes web search results
+### Text Processing Pipeline
+1. **Clean Text**: Remove conversational elements and normalize
+2. **Extract Key Phrases**: Identify medical terms and concepts
+3. **Summarize**: Create concise, focused summaries
+4. **Validate**: Ensure quality and relevance
+## Dependencies
+See `requirements.txt` for complete list. Key additions:
+- `requests` - Web search functionality
+- `beautifulsoup4` - HTML content extraction
+- NVIDIA API integration for Llama model

api/README.md ADDED Viewed

	@@ -0,0 +1,96 @@

+# API Module Structure
+## 📁 **Module Overview**
+### **config.py** - Configuration Management
+- Environment variables validation
+- Logging configuration
+- System resource monitoring
+- Memory optimization settings
+- CORS configuration
+### **database.py** - Database Management
+- MongoDB connection management
+- FAISS index lazy loading
+- SentenceTransformer model initialization
+- Symptom vectors management
+- GridFS integration
+### **retrieval.py** - RAG Retrieval Engine
+- Medical information retrieval from FAISS
+- Symptom-based diagnosis retrieval
+- Smart deduplication and similarity matching
+- Vector similarity computations
+### **chatbot.py** - Core Chatbot Logic
+- RAGMedicalChatbot class
+- Gemini API client
+- Search mode integration
+- Citation processing
+- Memory management integration
+### **routes.py** - API Endpoints
+- `/chat` - Main chat endpoint
+- `/health` - Health check
+- `/` - Root endpoint
+- Request/response handling
+### **app.py** - Main Application
+- FastAPI app initialization
+- Middleware configuration
+- Database initialization
+- Route registration
+- Server startup
+## 🔄 **Data Flow**
+```
+Request → routes.py → chatbot.py → retrieval.py → database.py
+                ↓
+         memory.py (context) + search.py (web search)
+                ↓
+         models/ (NVIDIA Llama processing)
+                ↓
+         Response with citations
+```
+## 🚀 **Benefits of Modular Structure**
+1. **Separation of Concerns**: Each module has a single responsibility
+2. **Easier Testing**: Individual modules can be tested in isolation
+3. **Better Maintainability**: Changes to one module don't affect others
+4. **Improved Readability**: Smaller files are easier to understand
+5. **Reusability**: Modules can be imported and used elsewhere
+6. **Scalability**: Easy to add new features without affecting existing code
+## 📊 **File Sizes Comparison**
+| File | Lines | Purpose |
+|------|-------|---------|
+| **app_old.py** | 370 | Monolithic (everything) |
+| **app.py** | 45 | Main app initialization |
+| **config.py** | 65 | Configuration |
+| **database.py** | 95 | Database management |
+| **retrieval.py** | 85 | RAG retrieval |
+| **chatbot.py** | 120 | Chatbot logic |
+| **routes.py** | 55 | API endpoints |
+| **Total** | 465 | Modular structure |
+## 🔧 **Usage**
+The modular structure maintains the same API interface:
+```python
+# All imports work the same way
+from api.app import app
+from api.chatbot import RAGMedicalChatbot
+from api.retrieval import retrieval_engine
+```
+## 🛠 **Development Benefits**
+- **Easier Debugging**: Issues can be isolated to specific modules
+- **Parallel Development**: Multiple developers can work on different modules
+- **Code Reviews**: Smaller files are easier to review
+- **Documentation**: Each module can have focused documentation
+- **Testing**: Unit tests can be written for each module independently

api/__init__.py ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ # API package
2	+ # Main API endpoints and routes

api/app.py ADDED Viewed

	@@ -0,0 +1,54 @@

+# api/app_new.py
+import uvicorn
+from fastapi import FastAPI
+from fastapi.middleware.cors import CORSMiddleware
+from api.config import setup_logging, check_system_resources, optimize_memory, CORS_ORIGINS
+from api.database import db_manager
+from api.routes import router
+# ✅ Setup logging
+logger = setup_logging()
+logger.info("🚀 Starting Medical Chatbot API...")
+# ✅ Monitor system resources
+check_system_resources(logger)
+# ✅ Optimize memory usage
+optimize_memory()
+# ✅ Initialize FastAPI app
+app = FastAPI(
+    title="Medical Chatbot API",
+    description="AI-powered medical chatbot with RAG and search capabilities",
+    version="1.0.0"
+)
+# ✅ Add CORS middleware
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=CORS_ORIGINS,
+    allow_credentials=True,
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+# ✅ Initialize database connections
+try:
+    db_manager.initialize_embedding_model()
+    db_manager.initialize_mongodb()
+    logger.info("✅ Database connections initialized successfully")
+except Exception as e:
+    logger.error(f"❌ Database initialization failed: {e}")
+    raise
+# ✅ Include routes
+app.include_router(router)
+# ✅ Run Uvicorn
+if __name__ == "__main__":
+    logger.info("[System] ✅ Starting FastAPI Server...")
+    try:
+        uvicorn.run(app, host="0.0.0.0", port=7860, log_level="info")
+    except Exception as e:
+        logger.error(f"❌ Server Startup Failed: {e}")
+        exit(1)

api/chatbot.py ADDED Viewed

	@@ -0,0 +1,140 @@

+# api/chatbot.py
+import re
+import logging
+from typing import Dict
+from google import genai
+from api.config import gemini_flash_api_key
+from api.retrieval import retrieval_engine
+from memory import MemoryManager
+from utils import translate_query, process_medical_image
+from search import search_web
+from models import process_search_query
+logger = logging.getLogger("medical-chatbot")
+class GeminiClient:
+    """Gemini API client for generating responses"""
+    def __init__(self):
+        self.client = genai.Client(api_key=gemini_flash_api_key)
+    def generate_content(self, prompt: str, model: str = "gemini-2.5-flash", temperature: float = 0.7) -> str:
+        """Generate content using Gemini API"""
+        try:
+            response = self.client.models.generate_content(model=model, contents=prompt)
+            return response.text
+        except Exception as e:
+            logger.error(f"[LLM] ❌ Error calling Gemini API: {e}")
+            return "Error generating response from Gemini."
+class RAGMedicalChatbot:
+    """Main chatbot class with RAG capabilities"""
+    def __init__(self, model_name: str, retrieve_function):
+        self.model_name = model_name
+        self.retrieve = retrieve_function
+        self.gemini_client = GeminiClient()
+        self.memory = MemoryManager()
+    def chat(self, user_id: str, user_query: str, lang: str = "EN", image_diagnosis: str = "", search_mode: bool = False) -> str:
+        """Main chat method with RAG and search capabilities"""
+        # 0. Translate query if not EN, this help our RAG system
+        if lang.upper() in {"VI", "ZH"}:
+            user_query = translate_query(user_query, lang.lower())
+        # 1. Fetch knowledge
+        ## a. KB for generic QA retrieval
+        retrieved_info = self.retrieve(user_query)
+        knowledge_base = "\n".join(retrieved_info)
+        ## b. Diagnosis RAG from symptom query
+        diagnosis_guides = retrieval_engine.retrieve_diagnosis_from_symptoms(user_query)
+        # 1.5. Search mode - web search and Llama processing
+        search_context = ""
+        url_mapping = {}
+        if search_mode:
+            logger.info(f"[SEARCH] Starting web search mode for query: {user_query}")
+            try:
+                # Search the web with max 10 resources
+                search_results = search_web(user_query, num_results=10)
+                if search_results:
+                    logger.info(f"[SEARCH] Retrieved {len(search_results)} web resources")
+                    # Process with Llama
+                    search_context, url_mapping = process_search_query(user_query, search_results)
+                    logger.info(f"[SEARCH] Processed with Llama, generated {len(url_mapping)} URL mappings")
+                else:
+                    logger.warning("[SEARCH] No search results found")
+            except Exception as e:
+                logger.error(f"[SEARCH] Search failed: {e}")
+                search_context = ""
+        # 2. Hybrid Context Retrieval: RAG + Recent History + Intelligent Selection
+        contextual_chunks = self.memory.get_contextual_chunks(user_id, user_query, lang)
+        # 3. Build prompt parts
+        parts = ["You are a medical chatbot, designed to answer medical questions."]
+        parts.append("Please format your answer using MarkDown.")
+        parts.append("**Bold for titles**, *italic for emphasis*, and clear headings.")
+        # 4. Append image diagnosis from VLM
+        if image_diagnosis:
+            parts.append(
+                "A user medical image is diagnosed by our VLM agent:\n"
+                f"{image_diagnosis}\n\n"
+                "Please incorporate the above findings in your response if medically relevant.\n\n"
+            )
+        # Append contextual chunks from hybrid approach
+        if contextual_chunks:
+            parts.append("Relevant context from conversation history:\n" + contextual_chunks)
+        # Load up guideline (RAG over medical knowledge base)
+        if knowledge_base:
+            parts.append(f"Example Q&A medical scenario knowledge-base: {knowledge_base}")
+        # Symptom-Diagnosis prediction RAG
+        if diagnosis_guides:
+            parts.append("Symptom-based diagnosis guidance (if applicable):\n" + "\n".join(diagnosis_guides))
+        # 5. Search context with citation instructions
+        if search_context:
+            parts.append("Additional information from web search:\n" + search_context)
+            parts.append("IMPORTANT: When you use information from the web search results above, you MUST add a citation tag <#ID> immediately after the relevant content, where ID is the document number (1, 2, 3, etc.). For example: 'According to recent studies <#1>, this condition affects...'")
+        parts.append(f"User's question: {user_query}")
+        parts.append(f"Language to generate answer: {lang}")
+        prompt = "\n\n".join(parts)
+        logger.info(f"[LLM] Question query in `prompt`: {prompt}") # Debug out checking RAG on kb and history
+        response = self.gemini_client.generate_content(prompt, model=self.model_name, temperature=0.7)
+        # 6. Process citations and replace with URLs
+        if search_mode and url_mapping:
+            response = self._process_citations(response, url_mapping)
+         # Store exchange + chunking
+        if user_id:
+            self.memory.add_exchange(user_id, user_query, response, lang=lang)
+        logger.info(f"[LLM] Response on `prompt`: {response.strip()}") # Debug out base response
+        return response.strip()
+    def _process_citations(self, response: str, url_mapping: Dict[int, str]) -> str:
+        """Replace citation tags with actual URLs"""
+        # Find all citation tags like <#1>, <#2>, etc.
+        citation_pattern = r'<#(\d+)>'
+        citations_found = re.findall(citation_pattern, response)
+        def replace_citation(match):
+            doc_id = int(match.group(1))
+            if doc_id in url_mapping:
+                url = url_mapping[doc_id]
+                logger.info(f"[CITATION] Replacing <#{doc_id}> with {url}")
+                return f'<{url}>'
+            else:
+                logger.warning(f"[CITATION] No URL mapping found for document ID {doc_id}")
+                return match.group(0)  # Keep original if URL not found
+        # Replace citations with URLs
+        processed_response = re.sub(citation_pattern, replace_citation, response)
+        logger.info(f"[CITATION] Processed {len(citations_found)} citations, {len(url_mapping)} URL mappings available")
+        return processed_response

api/config.py ADDED Viewed

	@@ -0,0 +1,70 @@

+# api/config.py
+import os
+import logging
+import psutil
+from typing import List
+# ✅ Environment Variables
+mongo_uri = os.getenv("MONGO_URI")
+index_uri = os.getenv("INDEX_URI")
+gemini_flash_api_key = os.getenv("FlashAPI")
+# Validate environment endpoint
+if not all([gemini_flash_api_key, mongo_uri, index_uri]):
+    raise ValueError("❌ Missing API keys! Set them in Hugging Face Secrets.")
+# ✅ Logging Configuration
+def setup_logging():
+    """Configure logging for the application"""
+    # Silence noisy loggers
+    for name in [
+        "uvicorn.error", "uvicorn.access",
+        "fastapi", "starlette",
+        "pymongo", "gridfs",
+        "sentence_transformers", "faiss",
+        "google", "google.auth",
+    ]:
+        logging.getLogger(name).setLevel(logging.WARNING)
+    logging.basicConfig(
+        level=logging.INFO,
+        format="%(asctime)s — %(name)s — %(levelname)s — %(message)s",
+        force=True
+    )
+    logger = logging.getLogger("medical-chatbot")
+    logger.setLevel(logging.DEBUG)
+    return logger
+# ✅ System Resource Monitoring
+def check_system_resources(logger):
+    """Monitor system resources and log warnings"""
+    memory = psutil.virtual_memory()
+    cpu = psutil.cpu_percent(interval=1)
+    disk = psutil.disk_usage("/")
+    logger.info(f"[System] 🔍 System Resources - RAM: {memory.percent}%, CPU: {cpu}%, Disk: {disk.percent}%")
+    if memory.percent > 85:
+        logger.warning("⚠️ High RAM usage detected!")
+    if cpu > 90:
+        logger.warning("⚠️ High CPU usage detected!")
+    if disk.percent > 90:
+        logger.warning("⚠️ High Disk usage detected!")
+# ✅ Memory Optimization
+def optimize_memory():
+    """Set environment variables for memory optimization"""
+    os.environ["OMP_NUM_THREADS"] = "1"
+    os.environ["TOKENIZERS_PARALLELISM"] = "false"
+# ✅ CORS Configuration
+CORS_ORIGINS = [
+    "http://localhost:5173",                    # Vite dev server
+    "http://localhost:3000",                    # Another vercel local dev
+    "https://medical-chatbot-henna.vercel.app", # ✅ Vercel frontend production URL
+]
+# ✅ Model Configuration
+MODEL_CACHE_DIR = "/app/model_cache"
+EMBEDDING_MODEL_DEVICE = "cpu"

api/database.py ADDED Viewed

	@@ -0,0 +1,100 @@

+# api/database.py
+import faiss
+import numpy as np
+import gridfs
+from pymongo import MongoClient
+from sentence_transformers import SentenceTransformer
+from api.config import mongo_uri, index_uri, MODEL_CACHE_DIR, EMBEDDING_MODEL_DEVICE
+import logging
+logger = logging.getLogger("medical-chatbot")
+class DatabaseManager:
+    def __init__(self):
+        self.embedding_model = None
+        self.index = None
+        self.symptom_vectors = None
+        self.symptom_docs = None
+        # MongoDB connections
+        self.client = None
+        self.iclient = None
+        self.symptom_client = None
+        # Collections
+        self.qa_collection = None
+        self.index_collection = None
+        self.symptom_col = None
+        self.fs = None
+    def initialize_embedding_model(self):
+        """Initialize the SentenceTransformer model"""
+        logger.info("[Embedder] 📥 Loading SentenceTransformer Model...")
+        try:
+            self.embedding_model = SentenceTransformer(MODEL_CACHE_DIR, device=EMBEDDING_MODEL_DEVICE)
+            self.embedding_model = self.embedding_model.half()  # Reduce memory
+            logger.info("✅ Model Loaded Successfully.")
+        except Exception as e:
+            logger.error(f"❌ Model Loading Failed: {e}")
+            raise
+    def initialize_mongodb(self):
+        """Initialize MongoDB connections and collections"""
+        # QA data
+        self.client = MongoClient(mongo_uri)
+        db = self.client["MedicalChatbotDB"]
+        self.qa_collection = db["qa_data"]
+        # FAISS Index data
+        self.iclient = MongoClient(index_uri)
+        idb = self.iclient["MedicalChatbotDB"]
+        self.index_collection = idb["faiss_index_files"]
+        # Symptom Diagnosis data
+        self.symptom_client = MongoClient(mongo_uri)
+        self.symptom_col = self.symptom_client["MedicalChatbotDB"]["symptom_diagnosis"]
+        # GridFS for FAISS index
+        self.fs = gridfs.GridFS(idb, collection="faiss_index_files")
+    def load_faiss_index(self):
+        """Lazy load FAISS index from GridFS"""
+        if self.index is None:
+            logger.info("[KB] ⏳ Loading FAISS index from GridFS...")
+            existing_file = self.fs.find_one({"filename": "faiss_index.bin"})
+            if existing_file:
+                stored_index_bytes = existing_file.read()
+                index_bytes_np = np.frombuffer(stored_index_bytes, dtype='uint8')
+                self.index = faiss.deserialize_index(index_bytes_np)
+                logger.info("[KB] ✅ FAISS Index Loaded")
+            else:
+                logger.error("[KB] ❌ FAISS index not found in GridFS.")
+        return self.index
+    def load_symptom_vectors(self):
+        """Lazy load symptom vectors for diagnosis"""
+        if self.symptom_vectors is None:
+            all_docs = list(self.symptom_col.find({}, {"embedding": 1, "answer": 1, "question": 1, "prognosis": 1}))
+            self.symptom_docs = all_docs
+            self.symptom_vectors = np.array([doc["embedding"] for doc in all_docs], dtype=np.float32)
+    def get_embedding_model(self):
+        """Get the embedding model"""
+        if self.embedding_model is None:
+            self.initialize_embedding_model()
+        return self.embedding_model
+    def get_qa_collection(self):
+        """Get QA collection"""
+        if self.qa_collection is None:
+            self.initialize_mongodb()
+        return self.qa_collection
+    def get_symptom_collection(self):
+        """Get symptom collection"""
+        if self.symptom_col is None:
+            self.initialize_mongodb()
+        return self.symptom_col
+# Global database manager instance
+db_manager = DatabaseManager()

app.py → api/legacy.py RENAMED Viewed

@@ -1,5 +1,6 @@
 # app.py
-import os
 import faiss
 import numpy as np
 import time
@@ -11,10 +12,9 @@ from google import genai
 from sentence_transformers import SentenceTransformer
 from sentence_transformers.util import cos_sim
 from memory import MemoryManager
-from translation import translate_query
-from vlm import process_medical_image
 from search import search_web
-from llama_integration import process_search_query
 # ✅ Enable Logging for Debugging
 import logging
@@ -239,14 +239,15 @@ class RAGMedicalChatbot:
         search_context = ""
         url_mapping = {}
         if search_mode:
-            logger.info("[SEARCH] Starting web search mode")
             try:
-                # Search the web
-                search_results = search_web(user_query, num_results=5)
                 if search_results:
                     # Process with Llama
                     search_context, url_mapping = process_search_query(user_query, search_results)
-                    logger.info(f"[SEARCH] Found {len(search_results)} results, processed with Llama")
                 else:
                     logger.warning("[SEARCH] No search results found")
             except Exception as e:
@@ -306,17 +307,22 @@ class RAGMedicalChatbot:
         # Find all citation tags like <#1>, <#2>, etc.
         citation_pattern = r'<#(\d+)>'
         def replace_citation(match):
             doc_id = int(match.group(1))
             if doc_id in url_mapping:
-                return f'<{url_mapping[doc_id]}>'
-            return match.group(0)  # Keep original if URL not found
         # Replace citations with URLs
         processed_response = re.sub(citation_pattern, replace_citation, response)
-        logger.info(f"[CITATION] Processed citations, found {len(re.findall(citation_pattern, response))} citations")
         return processed_response
 # ✅ Initialize Chatbot

 # app.py
+import os, json, re
+from typing import Dict
 import faiss
 import numpy as np
 import time
 from sentence_transformers import SentenceTransformer
 from sentence_transformers.util import cos_sim
 from memory import MemoryManager
+from utils import translate_query, process_medical_image, retrieve_diagnosis_from_symptoms
 from search import search_web
+from models import process_search_query
 # ✅ Enable Logging for Debugging
 import logging
         search_context = ""
         url_mapping = {}
         if search_mode:
+            logger.info(f"[SEARCH] Starting web search mode for query: {user_query}")
             try:
+                # Search the web with max 10 resources
+                search_results = search_web(user_query, num_results=10)
                 if search_results:
+                    logger.info(f"[SEARCH] Retrieved {len(search_results)} web resources")
                     # Process with Llama
                     search_context, url_mapping = process_search_query(user_query, search_results)
+                    logger.info(f"[SEARCH] Processed with Llama, generated {len(url_mapping)} URL mappings")
                 else:
                     logger.warning("[SEARCH] No search results found")
             except Exception as e:
         # Find all citation tags like <#1>, <#2>, etc.
         citation_pattern = r'<#(\d+)>'
+        citations_found = re.findall(citation_pattern, response)
         def replace_citation(match):
             doc_id = int(match.group(1))
             if doc_id in url_mapping:
+                url = url_mapping[doc_id]
+                logger.info(f"[CITATION] Replacing <#{doc_id}> with {url}")
+                return f'<{url}>'
+            else:
+                logger.warning(f"[CITATION] No URL mapping found for document ID {doc_id}")
+                return match.group(0)  # Keep original if URL not found
         # Replace citations with URLs
         processed_response = re.sub(citation_pattern, replace_citation, response)
+        logger.info(f"[CITATION] Processed {len(citations_found)} citations, {len(url_mapping)} URL mappings available")
         return processed_response
 # ✅ Initialize Chatbot

api/retrieval.py ADDED Viewed

	@@ -0,0 +1,100 @@

+# api/retrieval.py
+import numpy as np
+import logging
+from api.database import db_manager
+logger = logging.getLogger("medical-chatbot")
+class RetrievalEngine:
+    def __init__(self):
+        self.db_manager = db_manager
+    def retrieve_medical_info(self, query: str, k: int = 5, min_sim: float = 0.9) -> list:
+        """
+        Retrieve medical information from FAISS index
+        Min similarity between query and kb is to be 80%
+        """
+        index = self.db_manager.load_faiss_index()
+        if index is None:
+            return [""]
+        embedding_model = self.db_manager.get_embedding_model()
+        qa_collection = self.db_manager.get_qa_collection()
+        # Embed query
+        query_vec = embedding_model.encode([query], convert_to_numpy=True)
+        D, I = index.search(query_vec, k=k)
+        # Filter by cosine threshold
+        results = []
+        kept = []
+        kept_vecs = []
+        # Smart dedup on cosine threshold between similar candidates
+        for score, idx in zip(D[0], I[0]):
+            if score < min_sim:
+                continue
+            # List sim docs
+            doc = qa_collection.find_one({"i": int(idx)})
+            if not doc:
+                continue
+            # Only compare answers
+            answer = doc.get("Doctor", "").strip()
+            if not answer:
+                continue
+            # Check semantic redundancy among previously kept results
+            new_vec = embedding_model.encode([answer], convert_to_numpy=True)[0]
+            is_similar = False
+            for i, vec in enumerate(kept_vecs):
+                sim = np.dot(vec, new_vec) / (np.linalg.norm(vec) * np.linalg.norm(new_vec) + 1e-9)
+                if sim >= 0.9:  # High semantic similarity
+                    is_similar = True
+                    # Keep only better match to original query
+                    cur_sim_to_query = np.dot(vec, query_vec[0]) / (np.linalg.norm(vec) * np.linalg.norm(query_vec[0]) + 1e-9)
+                    new_sim_to_query = np.dot(new_vec, query_vec[0]) / (np.linalg.norm(new_vec) * np.linalg.norm(query_vec[0]) + 1e-9)
+                    if new_sim_to_query > cur_sim_to_query:
+                        kept[i] = answer
+                        kept_vecs[i] = new_vec
+                    break
+            # Non-similar candidates
+            if not is_similar:
+                kept.append(answer)
+                kept_vecs.append(new_vec)
+        return kept if kept else [""]
+    def retrieve_diagnosis_from_symptoms(self, symptom_text: str, top_k: int = 5, min_sim: float = 0.5) -> list:
+        """
+        Retrieve diagnosis information from symptom vectors
+        """
+        self.db_manager.load_symptom_vectors()
+        embedding_model = self.db_manager.get_embedding_model()
+        # Embed input
+        qvec = embedding_model.encode(symptom_text, convert_to_numpy=True)
+        qvec = qvec / (np.linalg.norm(qvec) + 1e-9)
+        # Similarity compute
+        sims = self.db_manager.symptom_vectors @ qvec  # cosine
+        sorted_idx = np.argsort(sims)[-top_k:][::-1]
+        seen_diag = set()
+        final = []  # Dedup
+        for i in sorted_idx:
+            sim = sims[i]
+            if sim < min_sim:
+                continue
+            label = self.db_manager.symptom_docs[i]["prognosis"]
+            if label not in seen_diag:
+                final.append(self.db_manager.symptom_docs[i]["answer"])
+                seen_diag.add(label)
+        return final
+# Global retrieval engine instance
+retrieval_engine = RetrievalEngine()

api/routes.py ADDED Viewed

	@@ -0,0 +1,63 @@

+# api/routes.py
+import time
+import logging
+from fastapi import APIRouter, Request
+from fastapi.responses import JSONResponse
+from api.chatbot import RAGMedicalChatbot
+from api.retrieval import retrieval_engine
+from utils import process_medical_image
+logger = logging.getLogger("medical-chatbot")
+# Create router
+router = APIRouter()
+# Initialize chatbot
+chatbot = RAGMedicalChatbot(
+    model_name="gemini-2.5-flash",
+    retrieve_function=retrieval_engine.retrieve_medical_info
+)
+@router.post("/chat")
+async def chat_endpoint(req: Request):
+    """Main chat endpoint with search mode support"""
+    body = await req.json()
+    user_id = body.get("user_id", "anonymous")
+    query_raw = body.get("query")
+    query = query_raw.strip() if isinstance(query_raw, str) else ""
+    lang = body.get("lang", "EN")
+    search_mode = body.get("search", False)
+    image_base64 = body.get("image_base64", None)
+    img_desc = body.get("img_desc", "Describe and investigate any clinical findings from this medical image.")
+    start = time.time()
+    image_diagnosis = ""
+    # LLM Only
+    if not image_base64:
+        logger.info(f"[BOT] LLM scenario. Search mode: {search_mode}")
+    # LLM+VLM
+    else:
+        # If image is present → diagnose first
+        safe_load = len(image_base64.encode("utf-8"))
+        if safe_load > 5_000_000:  # Img size safe processor
+            return JSONResponse({"response": "⚠️ Image too large. Please upload smaller images (<5MB)."})
+        logger.info(f"[BOT] VLM+LLM scenario. Search mode: {search_mode}")
+        logger.info(f"[VLM] Process medical image size: {safe_load}, desc: {img_desc}, {lang}.")
+        image_diagnosis = process_medical_image(image_base64, img_desc, lang)
+    answer = chatbot.chat(user_id, query, lang, image_diagnosis, search_mode)
+    elapsed = time.time() - start
+    # Final
+    return JSONResponse({"response": f"{answer}\n\n(Response time: {elapsed:.2f}s)"})
+@router.get("/health")
+async def health_check():
+    """Health check endpoint"""
+    return {"status": "healthy", "service": "medical-chatbot"}
+@router.get("/")
+async def root():
+    """Root endpoint"""
+    return {"message": "Medical Chatbot API", "version": "1.0.0"}

main.py ADDED Viewed

	@@ -0,0 +1,6 @@

+# main.py - Entry point for the Medical Chatbot API
+from api.app import app
+if __name__ == "__main__":
+    import uvicorn
+    uvicorn.run(app, host="0.0.0.0", port=7860, log_level="info")

memory/__init__.py ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ # Memory package
2	+ from .memory import MemoryManager

memory.py → memory/memory.py RENAMED Viewed

@@ -1,4 +1,4 @@
-# memory.py
 import re, time, hashlib, asyncio, os
 from collections import defaultdict, deque
 from typing import List, Dict
@@ -7,6 +7,7 @@ import faiss
 from sentence_transformers import SentenceTransformer
 from google import genai  # must be configured in app.py and imported globally
 import logging
 _LLM_SMALL = "gemini-2.5-flash-lite-preview-06-17"
 # Load embedding model
@@ -98,7 +99,7 @@ class MemoryManager:
     def get_contextual_chunks(self, user_id: str, current_query: str, lang: str = "EN") -> str:
         """
-        Use Gemini Flash Lite to create a summarization of relevant context from both recent history and RAG chunks.
         This ensures conversational continuity while providing a concise summary for the main LLM.
         """
         # Get both types of context
@@ -112,7 +113,8 @@ class MemoryManager:
         if not recent_history and not rag_chunks:
             logger.info(f"[Contextual] No context found, returning empty string")
             return ""
-        # Prepare context for Gemini to summarize
         context_parts = []
         # Add recent chat history
         if recent_history:
@@ -126,301 +128,204 @@ class MemoryManager:
             rag_text = "\n".join(rag_chunks)
             context_parts.append(f"Semantically relevant historical medical information:\n{rag_text}")
-        # Build summarization prompt
-        summarization_prompt = f"""
-        You are a medical assistant creating a concise summary of conversation context for continuity.
-        Current user query: "{current_query}"
-        Available context information:
-        {chr(10).join(context_parts)}
-        Task: Create a brief, coherent summary that captures the key points from the conversation history and relevant medical information that are important for understanding the current query.
-        Guidelines:
-        1. Focus on medical symptoms, diagnoses, treatments, or recommendations mentioned
-        2. Include any patient concerns or questions that are still relevant
-        3. Highlight any follow-up needs or pending clarifications
-        4. Keep the summary concise but comprehensive enough for context
-        5. Maintain conversational flow and continuity
-        Output: Provide a single, well-structured summary paragraph that can be used as context for the main LLM to provide a coherent response.
-        If no relevant context exists, return "No relevant context found."
-        Language context: {lang}
         """
-        logger.debug(f"[Contextual] Full prompt: {summarization_prompt}")
-        # Loop through the prompt and log the length of each part
         try:
-            # Use Gemini Flash Lite for summarization
-            client = genai.Client(api_key=os.getenv("FlashAPI"))
-            result = client.models.generate_content(
-                model=_LLM_SMALL,
-                contents=summarization_prompt
-            )
-            summary = result.text.strip()
-            if "No relevant context found" in summary:
-                logger.info(f"[Contextual] Gemini indicated no relevant context found")
-                return ""
-            logger.info(f"[Contextual] Gemini created summary: {summary[:100]}...")
-            return summary
         except Exception as e:
-            logger.warning(f"[Contextual] Gemini summarization failed: {e}")
-            logger.info(f"[Contextual] Using fallback summarization method")
-            # Fallback: create a simple summary
-            fallback_summary = []
-            # Fallback: add recent history
-            if recent_history:
-                recent_summary = f"Recent conversation: User asked about {recent_history[-1]['user'][:50]}... and received a response about {recent_history[-1]['bot'][:50]}..."
-                fallback_summary.append(recent_summary)
-                logger.info(f"[Contextual] Fallback: Added recent history summary")
-            # Fallback: add RAG chunks
-            if rag_chunks:
-                rag_summary = f"Relevant medical information: {len(rag_chunks)} chunks found covering various medical topics."
-                fallback_summary.append(rag_summary)
-                logger.info(f"[Contextual] Fallback: Added RAG chunks summary")
-            final_fallback = " ".join(fallback_summary) if fallback_summary else ""
-            return final_fallback
-    def reset(self, user_id: str):
-        self._drop_user(user_id)
-    # ---------- Internal helpers ----------
     def _touch_user(self, user_id: str):
-        if user_id not in self.text_cache and len(self.user_queue) >= self.user_queue.maxlen:
-            self._drop_user(self.user_queue.popleft())
         if user_id in self.user_queue:
             self.user_queue.remove(user_id)
         self.user_queue.append(user_id)
-    def _drop_user(self, user_id: str):
-        self.text_cache.pop(user_id, None)
-        self.chunk_index.pop(user_id, None)
-        self.chunk_meta.pop(user_id, None)
-        if user_id in self.user_queue:
-            self.user_queue.remove(user_id)
-    def _rebuild_index(self, user_id: str, keep_last: int):
-        """Trim chunk list + rebuild FAISS index for user."""
-        self.chunk_meta[user_id] = self.chunk_meta[user_id][-keep_last:]
-        index = self._new_index()
-        # Store each chunk's vector once and reuse it.
-        for chunk in self.chunk_meta[user_id]:
-            index.add(np.array([chunk["vec"]]))
-        self.chunk_index[user_id] = index
-    @staticmethod
-    def _new_index():
-        # Use cosine similarity (vectors must be L2-normalised)
-        return faiss.IndexFlatIP(384)
-    @staticmethod
-    def _embed(text: str):
-        vec = EMBED.encode(text, convert_to_numpy=True)
-        # L2 normalise for cosine on IndexFlatIP
-        return vec / (np.linalg.norm(vec) + 1e-9)
-    def chunk_response(self, response: str, lang: str, question: str = "") -> List[Dict]:
-        """
-        Calls Gemini to:
-          - Translate (if needed)
-          - Chunk by context/topic (exclude disclaimer section)
-          - Summarise
-        Returns: [{"tag": ..., "text": ...}, ...]
-        """
-        if not response: return []
-        # Gemini instruction
-        instructions = []
-        # if lang.upper() != "EN":
-        #     instructions.append("- Translate the response to English.")
-        instructions.append("- Break the translated (or original) text into semantically distinct parts, grouped by medical topic, symptom, assessment, plan, or instruction (exclude disclaimer section).")
-        instructions.append("- For each part, generate a clear, concise summary. The summary may vary in length depending on the complexity of the topic — do not omit key clinical instructions and exact medication names/doses if present.")
-        instructions.append("- At the start of each part, write `Topic: <concise but specific sentence (10-20 words) capturing patient context, condition, and action>`.")
-        instructions.append("- Separate each part using three dashes `---` on a new line.")
-        # if lang.upper() != "EN":
-        #     instructions.append(f"Below is the user-provided medical response written in `{lang}`")
-        # Gemini prompt
-        prompt = f"""
-        You are a medical assistant helping organize and condense a clinical response.
-        If helpful, use the user's latest question for context to craft specific topics.
-        User's latest question (context): {question}
-        ------------------------
-        {response}
-        ------------------------
-        Please perform the following tasks:
-        {chr(10).join(instructions)}
-        Output only the structured summaries, separated by dashes.
-        """
-        retries = 0
-        while retries < 5:
-            try:
-                client = genai.Client(api_key=os.getenv("FlashAPI"))
-                result = client.models.generate_content(
-                    model=_LLM_SMALL,
-                    contents=prompt
-                    # ,generation_config={"temperature": 0.4} # Skip temp configs for gem-flash
                 )
-                output = result.text.strip()
-                logger.info(f"[Memory] 📦 Gemini summarized chunk output: {output}")
-                return [
-                    {"tag": self._quick_extract_topic(chunk), "text": chunk.strip()}
-                    for chunk in output.split('---') if chunk.strip()
-                ]
-            except Exception as e:
-                logger.warning(f"[Memory] ❌ Gemini chunking failed: {e}")
-                retries += 1
-                time.sleep(0.5)
-        return [{"tag": "general", "text": response.strip()}]  # fallback
-    @staticmethod
-    def _quick_extract_topic(chunk: str) -> str:
-        """Heuristically extract the topic from a chunk (title line or first 3 words)."""
-        # Expecting 'Topic: <something>'
-        match = re.search(r'^Topic:\s*(.+)', chunk, re.IGNORECASE | re.MULTILINE)
-        if match:
-            return match.group(1).strip()
-        lines = chunk.strip().splitlines()
-        for line in lines:
-            if len(line.split()) <= 8 and line.strip().endswith(":"):
-                return line.strip().rstrip(":")
-        return " ".join(chunk.split()[:3]).rstrip(":.,")
-    # ---------- New merging/dedup logic ----------
-    def _upsert_stm(self, user_id: str, chunk: Dict, lang: str):
-        """Insert or merge a summarized chunk into STM with semantic dedup/merge.
-        Identical: replace the older with new. Partially similar: merge extra details from older into newer.
-        """
-        topic = self._enrich_topic(chunk.get("tag", ""), chunk.get("text", ""))
-        text  = chunk.get("text", "").strip()
-        vec   = self._embed(text)
-        now   = time.time()
-        entry = {"topic": topic, "text": text, "vec": vec, "timestamp": now, "used": 0}
-        stm = self.stm_summaries[user_id]
-        if not stm:
-            stm.append(entry)
-            return
-        # find best match
-        best_idx = -1
-        best_sim = -1.0
-        for i, e in enumerate(stm):
-            sim = float(np.dot(vec, e["vec"]))
-            if sim > best_sim:
-                best_sim = sim
-                best_idx = i
-        if best_sim >= 0.92:  # nearly identical
-            # replace older with current
-            stm.rotate(-best_idx)
-            stm.popleft()
-            stm.rotate(best_idx)
-            stm.append(entry)
-        elif best_sim >= 0.75:  # partially similar → merge
-            base = stm[best_idx]
-            merged_text = self._merge_texts(new_text=text, old_text=base["text"])  # add bits from old not in new
-            merged_topic = base["topic"] if len(base["topic"]) > len(topic) else topic
-            merged_vec = self._embed(merged_text)
-            merged_entry = {"topic": merged_topic, "text": merged_text, "vec": merged_vec, "timestamp": now, "used": base.get("used", 0)}
-            stm.rotate(-best_idx)
-            stm.popleft()
-            stm.rotate(best_idx)
-            stm.append(merged_entry)
-        else:
-            stm.append(entry)
     def _upsert_ltm(self, user_id: str, chunks: List[Dict], lang: str):
-        """Insert or merge chunks into LTM with semantic dedup/merge, then rebuild index.
-        Keeps only the most recent self.max_chunks entries.
-        """
-        current_list = self.chunk_meta[user_id]
         for chunk in chunks:
-            text = chunk.get("text", "").strip()
-            if not text:
-                continue
-            vec = self._embed(text)
-            topic = self._enrich_topic(chunk.get("tag", ""), text)
-            now = time.time()
-            new_entry = {"tag": topic, "text": text, "vec": vec, "timestamp": now, "used": 0}
-            if not current_list:
-                current_list.append(new_entry)
-                continue
-            # find best similar entry
-            best_idx = -1
-            best_sim = -1.0
-            for i, e in enumerate(current_list):
-                sim = float(np.dot(vec, e["vec"]))
-                if sim > best_sim:
-                    best_sim = sim
-                    best_idx = i
-            if best_sim >= 0.92:
-                # replace older with new
-                current_list[best_idx] = new_entry
-            elif best_sim >= 0.75:
-                # merge details
-                base = current_list[best_idx]
-                merged_text = self._merge_texts(new_text=text, old_text=base["text"])  # add unique sentences from old
-                merged_topic = base["tag"] if len(base["tag"]) > len(topic) else topic
-                merged_vec = self._embed(merged_text)
-                current_list[best_idx] = {"tag": merged_topic, "text": merged_text, "vec": merged_vec, "timestamp": now, "used": base.get("used", 0)}
             else:
-                current_list.append(new_entry)
-        # Trim and rebuild index
-        if len(current_list) > self.max_chunks:
-            current_list[:] = current_list[-self.max_chunks:]
-        self._rebuild_index(user_id, keep_last=self.max_chunks)
-    @staticmethod
-    def _split_sentences(text: str) -> List[str]:
-        # naive sentence splitter by ., !, ?
-        parts = re.split(r"(?<=[\.!?])\s+", text.strip())
-        return [p.strip() for p in parts if p.strip()]
-    def _merge_texts(self, new_text: str, old_text: str) -> str:
-        """Append sentences from old_text that are not already contained in new_text (by fuzzy match)."""
-        new_sents = self._split_sentences(new_text)
-        old_sents = self._split_sentences(old_text)
-        new_set = set(s.lower() for s in new_sents)
-        merged = list(new_sents)
-        for s in old_sents:
-            s_norm = s.lower()
-            # consider present if significant overlap with any existing sentence
-            if s_norm in new_set:
-                continue
-            # simple containment check
-            if any(self._overlap_ratio(s_norm, t.lower()) > 0.8 for t in merged):
-                continue
-            merged.append(s)
-        return " ".join(merged)
-    @staticmethod
-    def _overlap_ratio(a: str, b: str) -> float:
-        """Compute token overlap ratio between two sentences."""
-        ta = set(re.findall(r"\w+", a))
-        tb = set(re.findall(r"\w+", b))
-        if not ta or not tb:
-            return 0.0
-        inter = len(ta & tb)
-        union = len(ta | tb)
-        return inter / union
-    @staticmethod
-    def _enrich_topic(topic: str, text: str) -> str:
-        """Make topic more descriptive if it's too short by using the first sentence of the text.
-        Does not call LLM to keep latency low.
-        """
-        topic = (topic or "").strip()
-        if len(topic.split()) < 5 or len(topic) < 20:
-            sents = re.split(r"(?<=[\.!?])\s+", text.strip())
-            if sents:
-                first = sents[0]
-                # cap to ~16 words
-                words = first.split()
-                if len(words) > 16:
-                    first = " ".join(words[:16])
-                # ensure capitalized
-                return first.strip().rstrip(':')
-        return topic

+# memory_updated.py
 import re, time, hashlib, asyncio, os
 from collections import defaultdict, deque
 from typing import List, Dict
 from sentence_transformers import SentenceTransformer
 from google import genai  # must be configured in app.py and imported globally
 import logging
+from summarizer import summarizer
 _LLM_SMALL = "gemini-2.5-flash-lite-preview-06-17"
 # Load embedding model
     def get_contextual_chunks(self, user_id: str, current_query: str, lang: str = "EN") -> str:
         """
+        Use NVIDIA Llama to create a summarization of relevant context from both recent history and RAG chunks.
         This ensures conversational continuity while providing a concise summary for the main LLM.
         """
         # Get both types of context
         if not recent_history and not rag_chunks:
             logger.info(f"[Contextual] No context found, returning empty string")
             return ""
+        # Prepare context for summarization
         context_parts = []
         # Add recent chat history
         if recent_history:
             rag_text = "\n".join(rag_chunks)
             context_parts.append(f"Semantically relevant historical medical information:\n{rag_text}")
+        # Combine all context
+        full_context = "\n\n".join(context_parts)
+        # Use summarizer to create concise summary
+        try:
+            summary = summarizer.summarize_text(full_context, max_length=300)
+            logger.info(f"[Contextual] Generated summary using NVIDIA Llama: {len(summary)} characters")
+            return summary
+        except Exception as e:
+            logger.error(f"[Contextual] Summarization failed: {e}")
+            return full_context[:500] + "..." if len(full_context) > 500 else full_context
+    def chunk_response(self, response: str, lang: str, question: str = "") -> List[Dict]:
+        """
+        Use NVIDIA Llama to chunk and summarize response by medical topics.
+        Returns: [{"tag": ..., "text": ...}, ...]
         """
+        if not response:
+            return []
         try:
+            # Use summarizer to chunk and summarize
+            chunks = summarizer.chunk_response(response, max_chunk_size=500)
+            # Convert to the expected format
+            result_chunks = []
+            for i, chunk in enumerate(chunks):
+                # Extract topic from chunk (first sentence or key medical terms)
+                topic = self._extract_topic_from_chunk(chunk)
+                result_chunks.append({
+                    "tag": topic,
+                    "text": chunk
+                })
+            logger.info(f"[Memory] 📦 NVIDIA Llama summarized {len(result_chunks)} chunks")
+            return result_chunks
         except Exception as e:
+            logger.error(f"[Memory] NVIDIA Llama chunking failed: {e}")
+            # Fallback to simple chunking
+            return self._fallback_chunking(response)
+    def _extract_topic_from_chunk(self, chunk: str) -> str:
+        """Extract a concise topic from a chunk"""
+        # Look for medical terms or first sentence
+        sentences = chunk.split('.')
+        if sentences:
+            first_sentence = sentences[0].strip()
+            if len(first_sentence) > 50:
+                first_sentence = first_sentence[:50] + "..."
+            return first_sentence
+        return "Medical Information"
+    def _fallback_chunking(self, response: str) -> List[Dict]:
+        """Fallback chunking when NVIDIA Llama fails"""
+        # Simple sentence-based chunking
+        sentences = re.split(r'[.!?]+', response)
+        chunks = []
+        current_chunk = ""
+        for sentence in sentences:
+            sentence = sentence.strip()
+            if not sentence:
+                continue
+            if len(current_chunk) + len(sentence) > 300:
+                if current_chunk:
+                    chunks.append({
+                        "tag": "Medical Information",
+                        "text": current_chunk.strip()
+                    })
+                current_chunk = sentence
+            else:
+                current_chunk += sentence + ". "
+        if current_chunk:
+            chunks.append({
+                "tag": "Medical Information",
+                "text": current_chunk.strip()
+            })
+        return chunks
+    # ---------- Private Methods ----------
     def _touch_user(self, user_id: str):
+        """Update LRU queue"""
         if user_id in self.user_queue:
             self.user_queue.remove(user_id)
         self.user_queue.append(user_id)
+    def _new_index(self):
+        """Create new FAISS index"""
+        return faiss.IndexFlatIP(384)  # 384-dim embeddings
+    def _upsert_stm(self, user_id: str, chunk: Dict, lang: str):
+        """Update short-term memory with merging/deduplication"""
+        topic = chunk["tag"]
+        text = chunk["text"]
+        # Check for similar topics in STM
+        for entry in self.stm_summaries[user_id]:
+            if self._topics_similar(topic, entry["topic"]):
+                # Merge with existing entry
+                entry["text"] = summarizer.summarize_text(
+                    f"{entry['text']}\n{text}",
+                    max_length=200
                 )
+                entry["timestamp"] = time.time()
+                return
+        # Add new entry
+        self.stm_summaries[user_id].append({
+            "topic": topic,
+            "text": text,
+            "vec": self._embed(f"{topic} {text}"),
+            "timestamp": time.time(),
+            "used": 0
+        })
     def _upsert_ltm(self, user_id: str, chunks: List[Dict], lang: str):
+        """Update long-term memory with merging/deduplication"""
         for chunk in chunks:
+            # Check for similar chunks in LTM
+            similar_idx = self._find_similar_chunk(user_id, chunk["text"])
+            if similar_idx is not None:
+                # Merge with existing chunk
+                existing = self.chunk_meta[user_id][similar_idx]
+                merged_text = summarizer.summarize_text(
+                    f"{existing['text']}\n{chunk['text']}",
+                    max_length=300
+                )
+                existing["text"] = merged_text
+                existing["timestamp"] = time.time()
             else:
+                # Add new chunk
+                if len(self.chunk_meta[user_id]) >= self.max_chunks:
+                    # Remove oldest chunk
+                    self._remove_oldest_chunk(user_id)
+                vec = self._embed(chunk["text"])
+                self.chunk_index[user_id].add(np.array([vec]))
+                self.chunk_meta[user_id].append({
+                    "text": chunk["text"],
+                    "tag": chunk["tag"],
+                    "vec": vec,
+                    "timestamp": time.time(),
+                    "used": 0
+                })
+    def _topics_similar(self, topic1: str, topic2: str) -> bool:
+        """Check if two topics are similar"""
+        # Simple similarity check based on common words
+        words1 = set(topic1.lower().split())
+        words2 = set(topic2.lower().split())
+        intersection = words1.intersection(words2)
+        return len(intersection) >= 2
+    def _find_similar_chunk(self, user_id: str, text: str) -> int:
+        """Find similar chunk in LTM"""
+        if not self.chunk_meta[user_id]:
+            return None
+        text_vec = self._embed(text)
+        sims, idxs = self.chunk_index[user_id].search(np.array([text_vec]), k=3)
+        for sim, idx in zip(sims[0], idxs[0]):
+            if sim > 0.8:  # High similarity threshold
+                return int(idx)
+        return None
+    def _remove_oldest_chunk(self, user_id: str):
+        """Remove the oldest chunk from LTM"""
+        if not self.chunk_meta[user_id]:
+            return
+        # Find oldest chunk
+        oldest_idx = min(range(len(self.chunk_meta[user_id])),
+                        key=lambda i: self.chunk_meta[user_id][i]["timestamp"])
+        # Remove from index and metadata
+        self.chunk_meta[user_id].pop(oldest_idx)
+        # Note: FAISS doesn't support direct removal, so we rebuild the index
+        self._rebuild_index(user_id)
+    def _rebuild_index(self, user_id: str):
+        """Rebuild FAISS index after removal"""
+        if not self.chunk_meta[user_id]:
+            self.chunk_index[user_id] = self._new_index()
+            return
+        vectors = [chunk["vec"] for chunk in self.chunk_meta[user_id]]
+        self.chunk_index[user_id] = self._new_index()
+        self.chunk_index[user_id].add(np.array(vectors))
+    @staticmethod
+    def _embed(text: str):
+        vec = EMBED.encode(text, convert_to_numpy=True)
+        # L2 normalise for cosine on IndexFlatIP
+        return vec / (np.linalg.norm(vec) + 1e-9)

models/__init__.py ADDED Viewed

	@@ -0,0 +1,3 @@

+# Models package
+from .llama import NVIDIALLamaClient, process_search_query
+from .summarizer import TextSummarizer, summarizer

download_model.py → models/download_model.py RENAMED Viewed

File without changes

llama_integration.py → models/llama.py RENAMED Viewed

@@ -2,6 +2,7 @@ import os
 import requests
 import json
 import logging
 from typing import List, Dict, Tuple
 logger = logging.getLogger(__name__)
@@ -40,27 +41,11 @@ Keywords:"""
     def summarize_documents(self, documents: List[Dict], user_query: str) -> Tuple[str, Dict[int, str]]:
         """Use Llama to summarize documents and return summary with URL mapping"""
         try:
-            # Create document summaries
-            doc_summaries = []
-            url_mapping = {}
-            for doc in documents:
-                doc_id = doc['id']
-                url_mapping[doc_id] = doc['url']
-                # Create a summary prompt for each document
-                summary_prompt = f"""Summarize this medical information in 2-3 sentences, focusing on details relevant to: "{user_query}"
-Document: {doc['title']}
-Content: {doc['content'][:1000]}...
-Summary:"""
-                summary = self._call_llama(summary_prompt)
-                doc_summaries.append(f"Document {doc_id}: {summary}")
-            # Combine all summaries
-            combined_summary = "\n\n".join(doc_summaries)
             return combined_summary, url_mapping
@@ -68,41 +53,58 @@ Summary:"""
             logger.error(f"Failed to summarize documents: {e}")
             return "", {}
-    def _call_llama(self, prompt: str) -> str:
-        """Make API call to NVIDIA Llama model"""
-        try:
-            headers = {
-                "Authorization": f"Bearer {self.api_key}",
-                "Content-Type": "application/json"
-            }
-            payload = {
-                "model": self.model,
-                "messages": [
-                    {
-                        "role": "user",
-                        "content": prompt
-                    }
-                ],
-                "temperature": 0.7,
-                "max_tokens": 1000
-            }
-            response = requests.post(
-                self.base_url,
-                headers=headers,
-                json=payload,
-                timeout=30
-            )
-            response.raise_for_status()
-            result = response.json()
-            return result['choices'][0]['message']['content'].strip()
-        except Exception as e:
-            logger.error(f"Llama API call failed: {e}")
-            raise
 def process_search_query(user_query: str, search_results: List[Dict]) -> Tuple[str, Dict[int, str]]:
     """Process search results using Llama model"""

 import requests
 import json
 import logging
+import time
 from typing import List, Dict, Tuple
 logger = logging.getLogger(__name__)
     def summarize_documents(self, documents: List[Dict], user_query: str) -> Tuple[str, Dict[int, str]]:
         """Use Llama to summarize documents and return summary with URL mapping"""
         try:
+            # Import summarizer here to avoid circular imports
+            from summarizer import summarizer
+            # Use the summarizer for document summarization
+            combined_summary, url_mapping = summarizer.summarize_documents(documents, user_query)
             return combined_summary, url_mapping
             logger.error(f"Failed to summarize documents: {e}")
             return "", {}
+    def _call_llama(self, prompt: str, max_retries: int = 3) -> str:
+        """Make API call to NVIDIA Llama model with retry logic"""
+        for attempt in range(max_retries):
+            try:
+                headers = {
+                    "Authorization": f"Bearer {self.api_key}",
+                    "Content-Type": "application/json"
+                }
+                payload = {
+                    "model": self.model,
+                    "messages": [
+                        {
+                            "role": "user",
+                            "content": prompt
+                        }
+                    ],
+                    "temperature": 0.7,
+                    "max_tokens": 1000
+                }
+                response = requests.post(
+                    self.base_url,
+                    headers=headers,
+                    json=payload,
+                    timeout=30
+                )
+                response.raise_for_status()
+                result = response.json()
+                content = result['choices'][0]['message']['content'].strip()
+                if not content:
+                    raise ValueError("Empty response from Llama API")
+                return content
+            except requests.exceptions.Timeout:
+                logger.warning(f"Llama API timeout (attempt {attempt + 1}/{max_retries})")
+                if attempt == max_retries - 1:
+                    raise
+                time.sleep(2 ** attempt)  # Exponential backoff
+            except requests.exceptions.RequestException as e:
+                logger.warning(f"Llama API request failed (attempt {attempt + 1}/{max_retries}): {e}")
+                if attempt == max_retries - 1:
+                    raise
+                time.sleep(2 ** attempt)
+            except Exception as e:
+                logger.error(f"Llama API call failed: {e}")
+                raise
 def process_search_query(user_query: str, search_results: List[Dict]) -> Tuple[str, Dict[int, str]]:
     """Process search results using Llama model"""

models/summarizer.py ADDED Viewed

	@@ -0,0 +1,185 @@

+import re
+import logging
+from typing import List, Dict, Tuple
+from llama import NVIDIALLamaClient
+logger = logging.getLogger(__name__)
+class TextSummarizer:
+    def __init__(self):
+        self.llama_client = NVIDIALLamaClient()
+    def clean_text(self, text: str) -> str:
+        """Clean and normalize text for summarization"""
+        if not text:
+            return ""
+        # Remove common conversation starters and fillers
+        conversation_patterns = [
+            r'\b(hi|hello|hey|sure|okay|yes|no|thanks|thank you)\b',
+            r'\b(here is|this is|let me|i will|i can|i would)\b',
+            r'\b(summarize|summary|here\'s|here is)\b',
+            r'\b(please|kindly|would you|could you)\b',
+            r'\b(um|uh|er|ah|well|so|like|you know)\b'
+        ]
+        # Remove excessive whitespace and normalize
+        text = re.sub(r'\s+', ' ', text)
+        text = re.sub(r'\n+', ' ', text)
+        # Remove conversation patterns
+        for pattern in conversation_patterns:
+            text = re.sub(pattern, '', text, flags=re.IGNORECASE)
+        # Remove extra punctuation and normalize
+        text = re.sub(r'[.]{2,}', '.', text)
+        text = re.sub(r'[!]{2,}', '!', text)
+        text = re.sub(r'[?]{2,}', '?', text)
+        return text.strip()
+    def extract_key_phrases(self, text: str) -> List[str]:
+        """Extract key medical phrases and terms"""
+        if not text:
+            return []
+        # Medical term patterns
+        medical_patterns = [
+            r'\b(?:symptoms?|diagnosis|treatment|therapy|medication|drug|disease|condition|syndrome)\b',
+            r'\b(?:patient|doctor|physician|medical|clinical|healthcare)\b',
+            r'\b(?:blood pressure|heart rate|temperature|pulse|respiration)\b',
+            r'\b(?:acute|chronic|severe|mild|moderate|serious|critical)\b',
+            r'\b(?:pain|ache|discomfort|swelling|inflammation|infection)\b'
+        ]
+        key_phrases = []
+        for pattern in medical_patterns:
+            matches = re.findall(pattern, text, re.IGNORECASE)
+            key_phrases.extend(matches)
+        return list(set(key_phrases))  # Remove duplicates
+    def summarize_text(self, text: str, max_length: int = 200) -> str:
+        """Summarize text using NVIDIA Llama model"""
+        try:
+            if not text or len(text.strip()) < 50:
+                return text
+            # Clean the text first
+            cleaned_text = self.clean_text(text)
+            # Extract key phrases for context
+            key_phrases = self.extract_key_phrases(cleaned_text)
+            key_phrases_str = ", ".join(key_phrases[:5]) if key_phrases else "medical information"
+            # Create optimized prompt
+            prompt = f"""Summarize this medical text in {max_length} characters or less. Focus only on key medical facts, symptoms, treatments, and diagnoses. Do not include greetings, confirmations, or conversational elements.
+Key terms: {key_phrases_str}
+Text: {cleaned_text[:1500]}
+Summary:"""
+            summary = self.llama_client._call_llama(prompt)
+            # Post-process summary
+            summary = self.clean_text(summary)
+            # Ensure it's within length limit
+            if len(summary) > max_length:
+                summary = summary[:max_length-3] + "..."
+            return summary
+        except Exception as e:
+            logger.error(f"Summarization failed: {e}")
+            # Fallback to simple truncation
+            return self.clean_text(text)[:max_length]
+    def summarize_documents(self, documents: List[Dict], user_query: str) -> Tuple[str, Dict[int, str]]:
+        """Summarize multiple documents with URL mapping"""
+        try:
+            doc_summaries = []
+            url_mapping = {}
+            for doc in documents:
+                doc_id = doc['id']
+                url_mapping[doc_id] = doc['url']
+                # Create focused summary for each document
+                summary_prompt = f"""Summarize this medical document in 2-3 sentences, focusing on information relevant to: "{user_query}"
+Document: {doc['title']}
+Content: {doc['content'][:800]}
+Key medical information:"""
+                summary = self.llama_client._call_llama(summary_prompt)
+                summary = self.clean_text(summary)
+                doc_summaries.append(f"Document {doc_id}: {summary}")
+            combined_summary = "\n\n".join(doc_summaries)
+            return combined_summary, url_mapping
+        except Exception as e:
+            logger.error(f"Document summarization failed: {e}")
+            return "", {}
+    def summarize_conversation_chunk(self, chunk: str) -> str:
+        """Summarize a conversation chunk for memory"""
+        try:
+            if not chunk or len(chunk.strip()) < 30:
+                return chunk
+            cleaned_chunk = self.clean_text(chunk)
+            prompt = f"""Summarize this medical conversation in 1-2 sentences. Focus only on medical facts, symptoms, treatments, or diagnoses discussed. Remove greetings and conversational elements.
+Conversation: {cleaned_chunk[:1000]}
+Medical summary:"""
+            summary = self.llama_client._call_llama(prompt)
+            return self.clean_text(summary)
+        except Exception as e:
+            logger.error(f"Conversation summarization failed: {e}")
+            return self.clean_text(chunk)[:150]
+    def chunk_response(self, response: str, max_chunk_size: int = 500) -> List[str]:
+        """Split response into chunks and summarize each"""
+        try:
+            if not response or len(response) <= max_chunk_size:
+                return [response]
+            # Split by sentences first
+            sentences = re.split(r'[.!?]+', response)
+            chunks = []
+            current_chunk = ""
+            for sentence in sentences:
+                sentence = sentence.strip()
+                if not sentence:
+                    continue
+                # Check if adding this sentence would exceed limit
+                if len(current_chunk) + len(sentence) > max_chunk_size and current_chunk:
+                    chunks.append(self.summarize_conversation_chunk(current_chunk))
+                    current_chunk = sentence
+                else:
+                    current_chunk += sentence + ". "
+            # Add the last chunk
+            if current_chunk:
+                chunks.append(self.summarize_conversation_chunk(current_chunk))
+            return chunks
+        except Exception as e:
+            logger.error(f"Response chunking failed: {e}")
+            return [response]
+# Global summarizer instance
+summarizer = TextSummarizer()

warmup.py → models/warmup.py RENAMED Viewed

File without changes

search/__init__.py ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ # Search package
2	+ from .search import WebSearcher, search_web

search.py → search/search.py RENAMED Viewed

@@ -96,39 +96,54 @@ class WebSearcher:
             logger.warning(f"Failed to extract content from {url}: {e}")
             return ""
-    def search_and_extract(self, query: str, num_results: int = 5) -> List[Dict]:
         """Search for query and extract content from top results"""
         logger.info(f"Searching for: {query}")
-        # Get search results
-        search_results = self.search_duckduckgo(query, num_results)
-        # Extract content from each result
         enriched_results = []
         for i, result in enumerate(search_results):
             try:
                 logger.info(f"Extracting content from {result['url']}")
                 content = self.extract_content(result['url'])
-                if content:
                     enriched_results.append({
-                        'id': i + 1,
                         'url': result['url'],
                         'title': result['title'],
                         'content': content
                     })
                 # Add delay to be respectful
-                time.sleep(1)
             except Exception as e:
                 logger.warning(f"Failed to process {result['url']}: {e}")
                 continue
-        logger.info(f"Successfully processed {len(enriched_results)} results")
         return enriched_results
-def search_web(query: str, num_results: int = 5) -> List[Dict]:
     """Main function to search the web and return enriched results"""
     searcher = WebSearcher()
     return searcher.search_and_extract(query, num_results)

             logger.warning(f"Failed to extract content from {url}: {e}")
             return ""
+    def search_and_extract(self, query: str, num_results: int = 10) -> List[Dict]:
         """Search for query and extract content from top results"""
         logger.info(f"Searching for: {query}")
+        # Get search results (fetch more than needed for filtering)
+        search_results = self.search_duckduckgo(query, min(num_results * 2, 20))
+        # Extract content from each result with parallel processing
         enriched_results = []
+        failed_count = 0
+        max_failures = 5  # Stop after 5 consecutive failures
         for i, result in enumerate(search_results):
+            if len(enriched_results) >= num_results:
+                break
+            if failed_count >= max_failures:
+                logger.warning(f"Too many failures ({failed_count}), stopping extraction")
+                break
             try:
                 logger.info(f"Extracting content from {result['url']}")
                 content = self.extract_content(result['url'])
+                if content and len(content.strip()) > 50:  # Only include substantial content
                     enriched_results.append({
+                        'id': len(enriched_results) + 1,  # Sequential ID
                         'url': result['url'],
                         'title': result['title'],
                         'content': content
                     })
+                    failed_count = 0  # Reset failure counter
+                else:
+                    failed_count += 1
+                    logger.warning(f"Insufficient content from {result['url']}")
                 # Add delay to be respectful
+                time.sleep(0.5)  # Reduced delay for better performance
             except Exception as e:
+                failed_count += 1
                 logger.warning(f"Failed to process {result['url']}: {e}")
                 continue
+        logger.info(f"Successfully processed {len(enriched_results)} results out of {len(search_results)} attempted")
         return enriched_results
+def search_web(query: str, num_results: int = 10) -> List[Dict]:
     """Main function to search the web and return enriched results"""
     searcher = WebSearcher()
     return searcher.search_and_extract(query, num_results)

utils/__init__.py ADDED Viewed

	@@ -0,0 +1,4 @@

+# Utils package
+from .translation import translate_query
+from .vlm import process_medical_image
+from .diagnosis import retrieve_diagnosis_from_symptoms

clear_mongo.py → utils/clear_mongo.py RENAMED Viewed

File without changes

connect_mongo.py → utils/connect_mongo.py RENAMED Viewed

File without changes

diagnosis.py → utils/diagnosis.py RENAMED Viewed

File without changes

migrate.py → utils/migrate.py RENAMED Viewed

File without changes

translation.py → utils/translation.py RENAMED Viewed

File without changes

vlm.py → utils/vlm.py RENAMED Viewed

File without changes