LiamKhoaLe commited on
Commit
c379f08
Β·
1 Parent(s): 927a9b8

Upd README

Browse files
Files changed (1) hide show
  1. README.md +51 -61
README.md CHANGED
@@ -58,10 +58,13 @@ tags:
58
  - **Inline Mic Experience**: Built-in microphone widget with live recording timer that drops transcripts straight into the chat box
59
  - **Text-to-Speech**: Voice output generation using Maya1 TTS model (optional, fallback to MCP if unavailable) plus a one-click "Play Response" control for the latest answer
60
 
61
- ### πŸ›‘οΈ **Autonomous Guardrails**
62
- - **Gemini Supervisor Tasks**: Time-aware directives keep MedSwin within token budgets and can fast-track by skipping optional web search
63
- - **Self-Reflection Loop**: Gemini MCP scores complex answers and appends improvement hints when quality drops
64
- - **Automatic Citations**: Web-grounded replies include deduplicated source links from the latest search batch
 
 
 
65
  - **Deterministic Mode**: `Disable agentic reasoning` switch runs MedSwin alone for offline-friendly, model-only answers
66
 
67
  ### βš™οΈ **Advanced Configuration**
@@ -88,11 +91,12 @@ tags:
88
  ## πŸ”§ Technical Details
89
 
90
  - **Medical Models**: MedSwin/MedSwin-7B-SFT, MedSwin-7B-KD, MedSwin-Merged-TA-SFT-0.7
 
91
  - **Translation**: Gemini MCP (gemini-2.5-flash-lite)
92
  - **Document Parsing**: Gemini MCP (PDF, Word, TXT, MD, JSON, XML, CSV)
93
  - **Speech-to-Text**: Gemini MCP (gemini-2.5-flash-lite)
94
- - **Summarization**: Gemini MCP (gemini-2.5-flash)
95
- - **Reasoning & Reflection**: Gemini MCP (gemini-2.5-flash)
96
  - **Text-to-Speech**: maya-research/maya1 (optional, with MCP fallback)
97
  - **Embedding Model**: abhinand/MedEmbed-large-v0.1 (domain-tuned medical embeddings)
98
  - **RAG Framework**: LlamaIndex with hierarchical node parsing and auto-merging retrieval
@@ -146,35 +150,24 @@ MedLLM Agent is designed to support **doctors, clinicians, and medical specialis
146
 
147
  ### **Key Enterprise Capabilities**
148
 
149
- #### 1. **Autonomous Reasoning & Planning**
150
- - **Intelligent Query Analysis**: The system autonomously analyzes medical queries to understand:
151
- - Query type (diagnosis, treatment, drug information, symptom analysis)
152
- - Complexity level (simple, moderate, complex, multi-faceted)
153
- - Information requirements and data sources needed
 
154
 
155
- - **Multi-Step Execution Planning**: For complex clinical questions, the system:
156
- - Breaks down queries into sub-questions
157
- - Creates structured execution plans
158
- - Determines optimal information gathering strategies
159
- - Adapts approach based on query complexity
160
-
161
- #### 2. **Autonomous Decision-Making**
162
- - **Smart Resource Selection**: The system autonomously decides:
163
- - When to use document RAG vs. web search
164
- - When both sources are needed for comprehensive answers
165
- - Optimal retrieval parameters based on query characteristics
166
-
167
- - **Context-Aware Execution**: Automatically:
168
- - Overrides user settings when reasoning suggests better approaches
169
- - Combines multiple information sources intelligently
170
- - Prioritizes evidence-based medical sources
171
-
172
- #### 3. **Self-Reflection & Quality Assurance**
173
- - **Answer Quality Evaluation**: For complex queries, the system:
174
- - Self-evaluates answer completeness and accuracy
175
- - Identifies missing information or aspects
176
- - Provides improvement suggestions
177
- - Ensures high-quality clinical responses
178
 
179
  ### **Enterprise Use Cases for Medical Specialists**
180
 
@@ -201,33 +194,30 @@ MedLLM Agent is designed to support **doctors, clinicians, and medical specialis
201
  - Current medical guidelines
202
  - **MCP Protocol Benefits**: Standardized, modular tool integration allows easy switching between search providers and enhanced reliability
203
 
204
- ### **How It Works: Autonomous Reasoning in Action**
205
-
206
- 1. **Query Analysis** β†’ System analyzes: "What are the treatment options for Type 2 diabetes in elderly patients with renal impairment?"
207
- - Identifies as complex, multi-faceted query
208
- - Determines need for both RAG (patient records) and web search (current guidelines)
209
- - Breaks into sub-questions: treatment options, age considerations, renal function impact
210
-
211
- 2. **Autonomous Planning** β†’ Creates execution plan:
212
- - Step 1: Language detection/translation
213
- - Step 2: RAG retrieval from patient documents
214
- - Step 3: Web search for current diabetes treatment guidelines
215
- - Step 4: Multi-step reasoning for each sub-question
216
- - Step 5: Synthesis of comprehensive answer
217
- - Step 6: Self-reflection on answer quality
218
-
219
- 3. **Autonomous Execution** β†’ System executes plan:
220
- - Retrieves relevant patient history from documents (parsed via Gemini MCP)
221
- - Searches web for latest ADA/ADA-EASD guidelines using MCP tools
222
- - Fetches and extracts full content from search results via MCP
223
- - Summarizes web content using Gemini MCP
224
- - Synthesizes information considering age and renal function
225
- - Generates evidence-based treatment recommendations
226
-
227
- 4. **Self-Reflection** β†’ Evaluates answer:
228
- - Checks completeness (all sub-questions addressed?)
229
- - Verifies accuracy (evidence-based?)
230
- - Suggests improvements if needed
231
 
232
  ### **Enterprise Benefits**
233
 
 
58
  - **Inline Mic Experience**: Built-in microphone widget with live recording timer that drops transcripts straight into the chat box
59
  - **Text-to-Speech**: Voice output generation using Maya1 TTS model (optional, fallback to MCP if unavailable) plus a one-click "Play Response" control for the latest answer
60
 
61
+ ### 🀝 **MAC Architecture (Multi-Agent Collaboration)**
62
+ - **Gemini Supervisor**: Orchestrates query processing by breaking queries into 2-4 focused sub-topics (JSON format)
63
+ - **MedSwin Specialist**: Executes tasks sequentially, providing concise clinical answers
64
+ - **Search Mode**: Gemini creates 1-4 search strategies β†’ executes ddgs searches (1-2 sources each) β†’ summarizes briefly
65
+ - **RAG Mode**: Gemini brainstorms retrieved documents into 1-4 short contexts for MedSwin decision-making
66
+ - **Clean Output**: All internal thoughts/conversations are logged only; users see only the final answer
67
+ - **Markdown Format**: Final answers use bullet points (tables automatically converted)
68
  - **Deterministic Mode**: `Disable agentic reasoning` switch runs MedSwin alone for offline-friendly, model-only answers
69
 
70
  ### βš™οΈ **Advanced Configuration**
 
91
  ## πŸ”§ Technical Details
92
 
93
  - **Medical Models**: MedSwin/MedSwin-7B-SFT, MedSwin-7B-KD, MedSwin-Merged-TA-SFT-0.7
94
+ - **Architecture**: MAC (Multi-Agent Collaboration) - Gemini Supervisor + MedSwin Specialist
95
  - **Translation**: Gemini MCP (gemini-2.5-flash-lite)
96
  - **Document Parsing**: Gemini MCP (PDF, Word, TXT, MD, JSON, XML, CSV)
97
  - **Speech-to-Text**: Gemini MCP (gemini-2.5-flash-lite)
98
+ - **Supervisor Tasks**: Gemini MCP (gemini-2.5-flash) - query breakdown, search strategies, RAG brainstorming
99
+ - **MedSwin Execution**: GPU-tagged tasks for efficient inference
100
  - **Text-to-Speech**: maya-research/maya1 (optional, with MCP fallback)
101
  - **Embedding Model**: abhinand/MedEmbed-large-v0.1 (domain-tuned medical embeddings)
102
  - **RAG Framework**: LlamaIndex with hierarchical node parsing and auto-merging retrieval
 
150
 
151
  ### **Key Enterprise Capabilities**
152
 
153
+ #### 1. **MAC Architecture (Multi-Agent Collaboration)**
154
+ - **Gemini Supervisor Agent**:
155
+ - Breaks user queries into 2-4 focused sub-topics (JSON format)
156
+ - In search mode: creates 1-4 search strategies, executes ddgs (1-2 sources each), summarizes briefly
157
+ - In RAG mode: brainstorms retrieved documents into 1-4 concise contexts
158
+ - All supervisor decisions logged internally, not displayed
159
 
160
+ - **MedSwin Specialist Agent**:
161
+ - Executes tasks assigned by Gemini Supervisor (GPU-tagged)
162
+ - Processes each sub-topic sequentially with focused context
163
+ - Generates concise, clinically accurate answers
164
+ - Returns Markdown format with bullet points (tables auto-converted)
165
+
166
+ #### 2. **Clean User Experience**
167
+ - **Internal Thoughts Hidden**: All Gemini-MedSwin conversations logged only
168
+ - **Final Answer Only**: Users see only the polished, final answer
169
+ - **Structured Output**: Markdown bullets, no internal planning tables
170
+ - **Efficient Processing**: Contexts kept brief to respect token limits
 
 
 
 
 
 
 
 
 
 
 
 
171
 
172
  ### **Enterprise Use Cases for Medical Specialists**
173
 
 
194
  - Current medical guidelines
195
  - **MCP Protocol Benefits**: Standardized, modular tool integration allows easy switching between search providers and enhanced reliability
196
 
197
+ ### **How It Works: MAC Architecture in Action**
198
+
199
+ 1. **Gemini Supervisor - Query Breakdown** β†’ Analyzes query and breaks into 2-4 sub-topics (JSON):
200
+ - Example: "What are the treatment options for Type 2 diabetes in elderly patients with renal impairment?"
201
+ - Creates structured sub-topics: treatment options, age considerations, renal function impact
202
+ - All planning logged internally, not displayed to user
203
+
204
+ 2. **Gemini Supervisor - Context Preparation**:
205
+ - **Search Mode**: Creates 1-4 search strategies β†’ executes ddgs (1-2 sources each) β†’ summarizes briefly
206
+ - **RAG Mode**: Retrieves documents β†’ brainstorms into 1-4 concise contexts for MedSwin
207
+ - Contexts kept brief to respect MedSwin token limits
208
+
209
+ 3. **MedSwin Specialist - Task Execution** (GPU-tagged):
210
+ - Executes each sub-topic task sequentially
211
+ - Receives focused context from Gemini Supervisor
212
+ - Generates concise clinical answers (Markdown bullets, no tables)
213
+ - All execution logged internally
214
+
215
+ 4. **Final Answer Assembly**:
216
+ - Combines all MedSwin task answers
217
+ - Converts any tables to Markdown bullets
218
+ - Adds citations if web sources used
219
+ - Translates back if needed
220
+ - **Only final answer displayed** - all internal thoughts remain in logs
 
 
 
221
 
222
  ### **Enterprise Benefits**
223