# Setup Guide - Clinical Review Assistant

## Prerequisites

Before you begin, ensure you have:
- **Python 3.9+** installed
- **Git** installed
- **OpenAI API key** ([Get one here](https://platform.openai.com/api-keys))
- **Pinecone account and API key** ([Sign up here](https://www.pinecone.io/))

---

## Step 1: Clone the Repository
```bash
git clone https://github.com/mudejayaprakash/Clinical_Review_Assistant

cd clinical-review-assistant
```

---

## Step 2: Create Virtual Environment (Recommended)

**macOS/Linux:**
```bash
python3 -m venv venv
source venv/bin/activate
```

**Windows:**
```bash
python -m venv venv
venv\Scripts\activate
```

---

## Step 3: Install Dependencies
```bash
pip install -r requirements.txt
```

**Expected installation time:** 2-3 minutes

---

## Step 4: Configure Environment Variables

1. **Copy the example file:**
```bash
cp .env.example .env
```

2. **Edit `.env` file** and add your API keys:
```bash
# Required API Keys
OPENAI_API_KEY=sk-your-openai-api-key-here
PINECONE_API_KEY=your-pinecone-api-key-here
PINECONE_INDEX_NAME=medical-policies
PINECONE_NAMESPACE=policies

# Application Settings (Optional - defaults provided)
MODEL_SUMMARY=gpt-4o
MODEL_EVALUATION=gpt-4o
EMBEDDING_MODEL=cambridgeltl/SapBERT-from-PubMedBERT-fulltext
```

**Important:** Never commit the `.env` file to Git (already in `.gitignore`)

---

## Step 5: Set Up Pinecone Index

### Option A: Create Index via Pinecone Dashboard

1. Go to [Pinecone Console](https://app.pinecone.io/)
2. Click "Create Index"
3. Configure:
   - **Name:** `medical-policies`
   - **Dimensions:** `768` (for SapBERT embeddings)
   - **Metric:** `cosine`
   - **Region:** Choose closest to you
4. Click "Create Index"

### Option B: Create Index via Python
```bash
python3 << 'EOF'
from pinecone import Pinecone
import os
from dotenv import load_dotenv

load_dotenv()
pc = Pinecone(api_key=os.getenv('PINECONE_API_KEY'))

# Create index
pc.create_index(
    name='medical-policies',
    dimension=768,
    metric='cosine',
    spec={'serverless': {'cloud': 'aws', 'region': 'us-east-1'}}
)
print("✅ Pinecone index created successfully!")
EOF
```

---

## Step 6: Load Policy Documents (Optional)

**If you have insurance policy PDFs to load:**

1. **Place policy PDFs** in the `data/raw_policy_pdf/` folder:
```bash
mkdir -p data/raw_policy_pdf
# Copy your policy PDFs into this folder
```

2. **Run the data ingestion script:**
```bash
python tools/data_ingestion.py
```

This will:
- Extract text from PDFs
- Create chunks with section-aware splitting
- Generate SapBERT embeddings
- Upload to Pinecone index

**Expected time:** 2-5 minutes for 10 policies

**Note:** You can skip this step and test with an empty policy database, but Node 2 won't retrieve any policies.

---

## Step 7: Run the Application
```bash
streamlit run app.py
```

The application will open in your browser at: `http://localhost:8501`

---

## Step 8: Create Your First Account

1. On the login page, click **"Register"** tab
2. Enter a username and password
3. Click **"Create Account"**
4. Login with your new credentials

---

## Testing the Application

### Quick Test Workflow:

1. **Upload a test medical record** (PDF format)
2. Click **"Summarize and Analyze Records"**
3. Review the generated summary and chief complaints
4. View retrieved policies (if you loaded policy documents)
5. Enter test criteria:
```
   • Patient must be 18 years or older
   • Conservative medical management has failed
   • CT scan or endoscopy confirms septal deviation
```
6. Click **"Evaluate Criteria"**
7. Review results with evidence, page numbers and confidence scores

---

## Troubleshooting

### Issue: "ModuleNotFoundError"
**Solution:** Ensure you're in the virtual environment and run:
```bash
pip install -r requirements.txt
```

### Issue: "OpenAI API key not found"
**Solution:** Check that your `.env` file exists and contains valid API keys:
```bash
cat .env | grep OPENAI_API_KEY
```

### Issue: "Pinecone index not found"
**Solution:** Verify index name matches in `.env` and Pinecone dashboard:
```bash
python3 -c "from pinecone import Pinecone; import os; from dotenv import load_dotenv; load_dotenv(); pc = Pinecone(api_key=os.getenv('PINECONE_API_KEY')); print(pc.list_indexes())"
```

### Issue: "PDF processing fails"
**Solution:** Ensure PDF is:
- Under 50MB
- Not password-protected
- Contains extractable text (not just scanned images)

### Issue: "Port 8501 already in use"
**Solution:** Stop other Streamlit instances or use a different port:
```bash
streamlit run app.py --server.port 8502
```

---

## Project Structure
```
clinical-review-assistant/
├── app.py                          # Main Streamlit application
├── agents/
│   ├── __init__.py
│   ├── config.py                   # Configuration settings
│   ├── agent.py                    # LangGraph agent orchestrator
│   ├── nodes.py                    # Node 1, 2, 3 implementations
│   ├── security.py                 # Security & audit logging
│   └── auth.py                     # Authentication system
├── tools/
│   ├── __init__.py
│   ├── rag.py                      # RAG utilities
│   ├── rag_pinecone.py             # Pinecone integration
│   └── data_ingestion.py           # Policy ingestion pipeline
├── data/
│   ├── raw_policy_pdf/             # Policy PDFs (you create)
│   └── policy_txt/                 # Extracted text (auto-generated)
├── docs/
│   ├── agent_workflow.png
│   ├── architecture_diagram.png    
│   ├── screnshots/                 # To display in README
│   └── setup_guide.md              # This file                
├── requirements.txt                # Python dependencies
├── .env.example                    # Environment template
├── .env                            # Your API keys (create from .env.example)
├── .gitignore                      # Git ignore file
└── README.md                       # Project documentation
```

---

## Next Steps

- **Customize policies:** Add new insurance policies to `data/raw_policy_pdf/`
- **Test with real data:** Upload actual medical records (ensure PHI compliance)
- **Adjust configuration:** Modify `agents/config.py` for custom settings
- **Review logs:** Check `security.log` for audit trail
- **Scale deployment:** Deploy to Streamlit Cloud or AWS for production use

---

## Support

For issues or questions:
- Check [Troubleshooting](#troubleshooting) section above
- Review README for detailed documentation


---

## Development Mode

To run in development mode with auto-reload:
```bash
streamlit run app.py --server.runOnSave true
```

To view detailed logs:
```bash
tail -f security.log
```

---

**Setup complete!** You're ready to start using the Clinical Review Assistant. 🎉