LiamKhoaLe commited on
Commit
d4c1bbe
Β·
1 Parent(s): ec4d4b3

Use README

Browse files
Files changed (1) hide show
  1. README.md +46 -71
README.md CHANGED
@@ -27,7 +27,7 @@ tags:
27
  ### πŸ“„ **Document RAG (Retrieval-Augmented Generation)**
28
  - Upload medical documents (PDF, Word, TXT, MD, JSON, XML, CSV) and get answers based on your uploaded content
29
  - Document parsing powered by Gemini MCP for accurate text extraction
30
- - Hierarchical document indexing with auto-merging retrieval
31
  - Mitigates hallucination by grounding responses in your documents
32
  - Toggle RAG on/off - when disabled, provides concise clinical answers without document context
33
 
@@ -40,9 +40,9 @@ tags:
40
  - **Enriches Context**: Combines document RAG + web sources for comprehensive answers
41
 
42
  ### 🧠 **MedSwin Medical Specialist Models**
43
- - **MedSwin SFT** (default) - Supervised Fine-Tuned model
 
44
  - **MedSwin KD** - Knowledge Distillation model
45
- - **MedSwin TA** - Task-Aware merged model
46
  - Models download on-demand for efficient resource usage
47
  - Fine-tuned on MedAlpaca-7B for medical domain expertise
48
 
@@ -54,9 +54,8 @@ tags:
54
  - Powered by Gemini MCP for translation
55
 
56
  ### 🎀 **Voice Features**
57
- - **Speech-to-Text**: Microphone icon for voice input transcription using Gemini MCP
58
- - **Text-to-Speech**: Speaker icon in responses to generate voice output using Maya1 TTS model
59
- - Speech-to-text powered by Gemini MCP for accurate transcription
60
 
61
  ### βš™οΈ **Advanced Configuration**
62
  - Customizable generation parameters (temperature, top-p, top-k)
@@ -81,94 +80,71 @@ tags:
81
  ## πŸ”§ Technical Details
82
 
83
  - **Medical Models**: MedSwin/MedSwin-7B-SFT, MedSwin-7B-KD, MedSwin-Merged-TA-SFT-0.7
84
- - **Translation**: Gemini MCP (gemini-2.5-flash-lite for simple tasks)
85
- - **Document Parsing**: Gemini MCP (supports PDF, Word, TXT, MD, JSON, XML, CSV)
86
  - **Speech-to-Text**: Gemini MCP (gemini-2.5-flash-lite)
87
- - **Summarization**: Gemini MCP (gemini-2.5-flash for complex tasks)
88
  - **Reasoning & Reflection**: Gemini MCP (gemini-2.5-flash)
89
- - **Text-to-Speech**: maya-research/maya1
90
  - **Embedding Model**: abhinand/MedEmbed-large-v0.1 (domain-tuned medical embeddings)
91
- - **RAG Framework**: LlamaIndex with hierarchical node parsing
92
- - **Web Search**: Model Context Protocol (MCP) tools with automatic fallback to DuckDuckGo
93
- - **MCP Client**: Python MCP SDK for standardized tool integration
94
- - **Gemini MCP Server**: mcp-server via MCP protocol
95
 
96
  ## πŸ“‹ Requirements
97
 
98
  See `requirements.txt` for full dependency list. Key dependencies:
99
- - **MCP Integration**: `mcp`, `nest-asyncio` (primary - for MCP protocol support)
100
- - **Fallback Dependencies**: `requests`, `beautifulsoup4` (used when MCP is not available)
101
- - **Core ML**: `transformers`, `torch`
102
- - **RAG Framework**: `llama-index`
103
- - **Utilities**: `langdetect`, `gradio`, `spaces`
 
104
 
105
  ### πŸ”Œ MCP Configuration
106
 
107
- The application uses Gemini MCP (Model Context Protocol) for translation, document parsing, transcription, and summarization. Configure Gemini MCP server via environment variables:
108
 
109
  ```bash
110
- # Gemini MCP Server (required)
111
  export GEMINI_API_KEY="your-gemini-api-key"
112
 
113
- # Gemini MCP Server Configuration
114
  export MCP_SERVER_COMMAND="python"
115
- export MCP_SERVER_ARGS="-m mcp_server"
116
 
117
- # Optional Gemini Configuration
118
- export GEMINI_MODEL="gemini-2.5-flash" # For harder tasks (default)
119
- export GEMINI_MODEL_LITE="gemini-2.5-flash-lite" # For parsing and simple tasks (default)
120
  export GEMINI_TIMEOUT=300000 # Request timeout in milliseconds (default: 5 minutes)
121
  export GEMINI_MAX_OUTPUT_TOKENS=8192 # Maximum output tokens (default)
122
- export GEMINI_MAX_FILES=10 # Maximum number of files per request (default)
123
- export GEMINI_MAX_TOTAL_FILE_SIZE=50 # Maximum total file size in MB (default)
124
  export GEMINI_TEMPERATURE=0.2 # Temperature for generation 0-2 (default: 0.2)
125
  ```
126
 
127
- **Available Gemini MCP Tools:**
128
- - **Translation**: Multi-language translation using Gemini MCP (gemini-2.5-flash-lite)
129
- - **Document Parsing**: Extract text from PDF, Word, and other documents using Gemini MCP
130
- - **Speech-to-Text**: Audio transcription using Gemini MCP (gemini-2.5-flash-lite)
131
- - **Summarization**: Web content summarization using Gemini MCP (gemini-2.5-flash)
132
- - **Reasoning & Reflection**: Query analysis and answer quality evaluation using Gemini MCP
133
-
134
- **Supported File Types for Document Parsing:**
135
- - Documents: PDF, DOC, DOCX (treated as images, one page = one image)
136
- - Text: TXT, MD, JSON, XML, CSV
137
- - Images: JPG, JPEG, PNG, GIF, WebP, SVG, BMP, TIFF
138
- - Audio: MP3, WAV, AIFF, AAC, OGG, FLAC (up to 15MB per file)
139
- - Video: MP4, AVI, MOV, WEBM, FLV, MPG, WMV (up to 10 files per request)
140
-
141
- 1. **Install MCP Python SDK** (already in requirements.txt):
142
- ```bash
143
- pip install mcp nest-asyncio
144
- ```
145
 
146
- 2. **Install Gemini MCP Server**:
147
  ```bash
148
- # Install Python package
149
- pip install mcp-server
150
  ```
151
 
152
- 3. **Get Gemini API Key**:
153
  - Visit [Google AI Studio](https://aistudio.google.com/) to get your API key
154
- - Set it as an environment variable: `export GEMINI_API_KEY="your-api-key"`
155
 
156
- 4. **Configure via Environment Variables**:
157
- ```bash
158
- export GEMINI_API_KEY="your-gemini-api-key"
159
- export MCP_SERVER_COMMAND="python"
160
- export MCP_SERVER_ARGS="-m mcp_server"
161
- ```
162
 
163
- **Note**: The application requires Gemini MCP for translation, document parsing, transcription, and summarization. Web search functionality still supports fallback to direct library calls if MCP is not configured.
164
 
165
  ## 🎯 Use Cases
166
 
167
- - Medical document Q&A
168
- - Clinical information retrieval
169
- - Medical research assistance
170
- - Multi-language medical consultations
171
- - Evidence-based medical answers
172
 
173
  ## πŸ₯ Enterprise-Level Clinical Decision Support
174
 
@@ -274,16 +250,15 @@ MedLLM Agent is designed to support **doctors, clinicians, and medical specialis
274
 
275
  ### **Implementation in Clinical Settings**
276
 
277
- **Hospital Systems**: Deploy for clinical decision support, integrating with EMR systems and institutional medical libraries.
 
 
 
278
 
279
- **Specialty Clinics**: Customize for specific medical specialties by uploading specialty-specific documents and guidelines.
280
-
281
- **Medical Education**: Support medical training and education with comprehensive, evidence-based answers.
282
 
283
- **Research Institutions**: Accelerate medical research by synthesizing information from multiple sources.
284
 
285
  ---
286
 
287
- **Note**: This system is designed to **assist** medical professionals with information retrieval and synthesis. It does not replace clinical judgment. All medical decisions should be made by qualified healthcare professionals who consider the full clinical context, patient-specific factors, and their professional expertise.
288
-
289
- > Introduction: A medical app for MCP-1st-Birthday hackathon, integrating MCP searcher and document RAG with autonomous reasoning, planning, and execution capabilities for enterprise-level clinical decision support.
 
27
  ### πŸ“„ **Document RAG (Retrieval-Augmented Generation)**
28
  - Upload medical documents (PDF, Word, TXT, MD, JSON, XML, CSV) and get answers based on your uploaded content
29
  - Document parsing powered by Gemini MCP for accurate text extraction
30
+ - Hierarchical document indexing with auto-merging retrieval for comprehensive context
31
  - Mitigates hallucination by grounding responses in your documents
32
  - Toggle RAG on/off - when disabled, provides concise clinical answers without document context
33
 
 
40
  - **Enriches Context**: Combines document RAG + web sources for comprehensive answers
41
 
42
  ### 🧠 **MedSwin Medical Specialist Models**
43
+ - **MedSwin TA** (default) - Task-Aware merged model
44
+ - **MedSwin SFT** - Supervised Fine-Tuned model
45
  - **MedSwin KD** - Knowledge Distillation model
 
46
  - Models download on-demand for efficient resource usage
47
  - Fine-tuned on MedAlpaca-7B for medical domain expertise
48
 
 
54
  - Powered by Gemini MCP for translation
55
 
56
  ### 🎀 **Voice Features**
57
+ - **Speech-to-Text**: Voice input transcription using Gemini MCP
58
+ - **Text-to-Speech**: Voice output generation using Maya1 TTS model (optional, fallback to MCP if unavailable)
 
59
 
60
  ### βš™οΈ **Advanced Configuration**
61
  - Customizable generation parameters (temperature, top-p, top-k)
 
80
  ## πŸ”§ Technical Details
81
 
82
  - **Medical Models**: MedSwin/MedSwin-7B-SFT, MedSwin-7B-KD, MedSwin-Merged-TA-SFT-0.7
83
+ - **Translation**: Gemini MCP (gemini-2.5-flash-lite)
84
+ - **Document Parsing**: Gemini MCP (PDF, Word, TXT, MD, JSON, XML, CSV)
85
  - **Speech-to-Text**: Gemini MCP (gemini-2.5-flash-lite)
86
+ - **Summarization**: Gemini MCP (gemini-2.5-flash)
87
  - **Reasoning & Reflection**: Gemini MCP (gemini-2.5-flash)
88
+ - **Text-to-Speech**: maya-research/maya1 (optional, with MCP fallback)
89
  - **Embedding Model**: abhinand/MedEmbed-large-v0.1 (domain-tuned medical embeddings)
90
+ - **RAG Framework**: LlamaIndex with hierarchical node parsing and auto-merging retrieval
91
+ - **Web Search**: MCP tools with automatic fallback to DuckDuckGo
92
+ - **MCP Server**: Bundled Python-based Gemini MCP server (agent.py)
 
93
 
94
  ## πŸ“‹ Requirements
95
 
96
  See `requirements.txt` for full dependency list. Key dependencies:
97
+ - **MCP Integration**: `mcp`, `nest-asyncio`, `google-genai` (for Gemini MCP server)
98
+ - **Fallback Dependencies**: `requests`, `beautifulsoup4`, `ddgs` (used when MCP web search unavailable)
99
+ - **Core ML**: `transformers`, `torch`, `accelerate`
100
+ - **RAG Framework**: `llama-index`, `llama_index.llms.huggingface`, `llama_index.embeddings.huggingface`
101
+ - **Utilities**: `langdetect`, `gradio`, `spaces`, `soundfile`
102
+ - **TTS**: Optional - `TTS` package (voice features work with MCP fallback if unavailable)
103
 
104
  ### πŸ”Œ MCP Configuration
105
 
106
+ The application uses a bundled Gemini MCP server (agent.py) for translation, document parsing, transcription, and summarization. Configure via environment variables:
107
 
108
  ```bash
109
+ # Required: Gemini API Key
110
  export GEMINI_API_KEY="your-gemini-api-key"
111
 
112
+ # Optional: Gemini MCP Server Configuration (defaults to bundled agent.py)
113
  export MCP_SERVER_COMMAND="python"
114
+ export MCP_SERVER_ARGS="/path/to/agent.py" # Default: bundled agent.py
115
 
116
+ # Optional: Gemini Model Configuration
117
+ export GEMINI_MODEL="gemini-2.5-flash" # For complex tasks (default)
118
+ export GEMINI_MODEL_LITE="gemini-2.5-flash-lite" # For simple tasks (default)
119
  export GEMINI_TIMEOUT=300000 # Request timeout in milliseconds (default: 5 minutes)
120
  export GEMINI_MAX_OUTPUT_TOKENS=8192 # Maximum output tokens (default)
 
 
121
  export GEMINI_TEMPERATURE=0.2 # Temperature for generation 0-2 (default: 0.2)
122
  ```
123
 
124
+ **Setup Steps:**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
125
 
126
+ 1. **Install Dependencies** (already in requirements.txt):
127
  ```bash
128
+ pip install mcp nest-asyncio google-genai
 
129
  ```
130
 
131
+ 2. **Get Gemini API Key**:
132
  - Visit [Google AI Studio](https://aistudio.google.com/) to get your API key
133
+ - Set it: `export GEMINI_API_KEY="your-api-key"`
134
 
135
+ 3. **Run the Application**:
136
+ - The bundled MCP server (agent.py) will be used automatically
137
+ - No additional MCP server installation required
 
 
 
138
 
139
+ **Note**: The application requires Gemini MCP for translation, document parsing, transcription, and summarization. Web search supports fallback to direct DuckDuckGo API if MCP web search tools are unavailable.
140
 
141
  ## 🎯 Use Cases
142
 
143
+ - **Clinical Decision Support**: Evidence-based answers from documents and current medical literature
144
+ - **Medical Document Q&A**: Query uploaded patient records, research papers, and clinical guidelines
145
+ - **Multi-Language Consultations**: Automatic translation for international patient care
146
+ - **Research Assistance**: Synthesize information from multiple medical sources
147
+ - **Drug Information**: Comprehensive drug information with interaction analysis
148
 
149
  ## πŸ₯ Enterprise-Level Clinical Decision Support
150
 
 
250
 
251
  ### **Implementation in Clinical Settings**
252
 
253
+ - **Hospital Systems**: Clinical decision support with EMR integration and institutional medical libraries
254
+ - **Specialty Clinics**: Customize with specialty-specific documents and guidelines
255
+ - **Medical Education**: Comprehensive, evidence-based answers for training and education
256
+ - **Research Institutions**: Accelerate research by synthesizing information from multiple sources
257
 
258
+ ---
 
 
259
 
260
+ **⚠️ Important Disclaimer**: This system is designed to **assist** medical professionals with information retrieval and synthesis. It does not replace clinical judgment. All medical decisions must be made by qualified healthcare professionals who consider the full clinical context, patient-specific factors, and their professional expertise.
261
 
262
  ---
263
 
264
+ > **Built for MCP-1st-Birthday Hackathon**: Enterprise-level clinical decision support system integrating MCP protocol, document RAG, and autonomous reasoning capabilities.