--- license: cc-by-nc-sa-4.0 library_name: keras pipeline_tag: image-classification language: en tags: - medical-imaging - ct - lung-cancer - efficientnet-b0 - transfer-learning - grad-cam model-index: - name: EfficientNetB0 Lung CT Classifier (4-class) results: - task: type: image-classification name: Image Classification dataset: name: Hany Lung Cancer CT (derived; cleaned) type: custom split: test metrics: - type: accuracy value: TODO:0.XX - type: precision value: TODO:0.XX - type: recall value: TODO:0.XX - type: f1 value: TODO:0.XX --- ## Attribution **Original Source:** > Hany H. (2020). *Chest CT-Scan Images Dataset*. Kaggle. > [https://www.kaggle.com/datasets/hanyhossam/chest-ctscan-images-dataset](https://www.kaggle.com/datasets/hanyhossam/chest-ctscan-images-dataset) **Original License:** > Database: Open Database Commons Open Database License (ODbL v1.0) > [https://opendatacommons.org/licenses/odbl/1-0/](https://opendatacommons.org/licenses/odbl/1-0/) **Derived Dataset Author:** > Ashley Blackwell (2025). *Chest CT-Scan Images (Cleaned, Derived from Hany et al.)*. Hugging Face Datasets. > https://huggingface.co/datasets/ashleyblackwell/lung-ct-cleaned-hany --- ## Cleaning & Preprocessing Summary The original dataset was processed and curated to ensure **consistency, quality, and reproducibility** for use in deep-learning experiments (i.e.., the *EfficientNet-B0 Lung CT Classifier*). ### Steps Performed 1. **Integrity Checks:** Removed corrupted or unreadable `.jpg` and `.png` files. 2. **Resolution Standardization:** Resized all images to `224 × 224 × 3` pixels. 3. **Color Normalization:** Converted grayscale scans to RGB format. 4. **Class Organization:** Verified folder structure for four diagnostic categories: - Adenocarcinoma - Large-Cell Carcinoma - Squamous-Cell Carcinoma - Normal 5. **Stratified Splits:** - Train: 70% - Validation: 20% - Test: 10% 6. **Metadata File:** Generated `metadata.csv` containing filename, class label, and original resolution for traceability. --- ## Dataset Overview | Split | Approx. Images | Notes | |:------|---------------:|:------| | Train | ~TODO | Stratified by class | | Validation | ~TODO | For hyperparameter tuning | | Test | ~TODO | Final evaluation set | | **Total** | ~TODO | All cleaned and standardized | --- ## Intended Use - **Purpose:** Designed for research, coursework, and educational demonstrations in medical image classification, model interpretability (Grad-CAM), and reproducible machine learning pipelines. - **Out of Scope:** This dataset **must not** be used for clinical diagnosis, treatment decisions, or commercial medical software development. --- ## Legal & License Information ### License This dataset is distributed under the **Open Data Commons Open Database License (ODbL v1.0)**. You are free to: - **Share:** Copy, distribute, and use the database. - **Create:** Produce works from the database. - **Adapt:** Modify, transform, and build upon the database. Full legal text: [https://opendatacommons.org/licenses/odbl/1-0/](https://opendatacommons.org/licenses/odbl/1-0/) --- ## Intended Use - **Purpose:** Designed for research, coursework, and educational demonstrations in medical image classification, model interpretability (Grad-CAM), and reproducible machine learning pipelines. ## Scope - **Intended**: Research, UMGC coursework, model-interpretability demos (Grad-CAM), benchmarking. ## Out-of-scope: Clinical diagnosis, patient triage, or any safety-critical application. - **Model Architecture** - **Backbone**: EfficientNet-B0 (ImageNet-initialized, fine-tuned) - **Input size**: 224 × 224 × 3 - **Head**: GlobalAveragePooling → Dropout (TODO: rate) → Dense(4, softmax) - **Loss**: Categorical Cross-Entropy - **Optimizer**: TODO (e.g., Adam, lr = 1e-4 with decay) - **Epochs / Batch size**: TODO - **Class labels (index)**: 0: Adenocarcinoma 1: Large-Cell Carcinoma 2: Squamous-Cell Carcinoma 3: Normal --- ## Data & Preprocessing Source: Derived from Hany Lung Cancer CT Scan dataset (Kaggle). Corrupted and irregular-resolution images were removed and all remaining images standardized to 224×224. Split: Train/Val/Test = 70/20/10 (stratified). Transforms: Resize → RGB conversion → normalize to [0,1] or use preprocess_input. Artifacts logged: Confusion matrix, classification report, Grad-CAM overlays. Attribution: Credit original dataset per its license when sharing or publishing. --- ## Evaluation Test set size: TODO:N Metrics (macro): Accuracy, Precision, Recall, F1 Class Precision Recall F1 Support Adenocarcinoma TODO TODO TODO TODO Large-Cell TODO TODO TODO TODO Squamous TODO TODO TODO TODO Normal TODO TODO TODO TODO Macro Avg TODO TODO TODO N ## Suggested Environment tensorflow==2.15.0 keras==2.15.0 huggingface_hub>=0.23.0 numpy>=1.24 --- ## Explainability (Grad-CAM) Last conv layer: top_conv for EfficientNet-B0. Tip: Use Grad-CAM to overlay heatmaps and validate that the model focuses on pathologically relevant regions. ## Limitations, Bias & Ethical Considerations ## Domain shift: CT protocols and scanners vary; may affect generalization. Label noise: Community datasets can contain mislabels. Generalization: Model is not clinically validated. Mitigation: Use Grad-CAM audits and external validation before any applied use. --- ## Training & Reproducibility Hardware: TODO (e.g., NVIDIA T4 / A100 / local GPU). Training time: TODO Seed / Determinism: TODO Reproduction steps: TODO (link to notebook or script if available). ## License Model weights & code: CC BY-NC-SA 4.0 (non-commercial, share-alike, with attribution). Dataset (derived): Follow the original dataset’s license terms and provide credit to the creator. ## Citation If you use this model, please cite: Blackwell, A. (2025). EfficientNet-B0 Lung CT Classifier (4-class) [Computer software]. Hugging Face. https://huggingface.co/TODO @software{blackwell2025lungct, author = {Blackwell, Ashley}, title = {EfficientNet-B0 Lung CT Classifier (4-class)}, year = {2025}, publisher = {Hugging Face}, url = {https://huggingface.co/TODO} } 👩‍🏫 Maintainers Ashley Blackwell — **Questions and feedback welcome via the Hugging Face Discussions tab.** 🗒 Changelog 2025-10-06: Initial public release (.keras weights), added model card, class map, and metric placeholders. --- ## Citation If you use this dataset, please cite both the original source and the derived version: **Original dataset:** > Hany H. (2020). *Chest CT-Scan Images Dataset*. Kaggle. > https://www.kaggle.com/datasets/hanyhossam/chest-ctscan-images-dataset **Derived version:** > Blackwell, A. (2025). *Chest CT-Scan Images (Cleaned, Derived from Hany et al.)* [Dataset]. Hugging Face. > https://huggingface.co/datasets/ashleyblackwell/lung-ct-cleaned-hany ```bibtex @dataset{hany2020chestct, author = {Hany, H.}, title = {Chest CT-Scan Images Dataset}, year = {2020}, publisher = {Kaggle}, url = {https://www.kaggle.com/datasets/hanyhossam/chest-ctscan-images-dataset} } @dataset{blackwell2025lungctcleaned, author = {Blackwell, Ashley}, title = {Chest CT-Scan Images (Cleaned, Derived from Hany et al.)}, year = {2025}, publisher = {Hugging Face}, url = {https://huggingface.co/datasets/ashleyblackwell/lung-ct-cleaned-hany} } --- ## How to Use (Load & Inference) **Option A — Download from the Hub** - from huggingface_hub import hf_hub_download import json, numpy as np, tensorflow as tf from tensorflow.keras.preprocessing import image REPO_ID = "TODO:your-username/efficientnetb0-lung-ct-4class" model_path = hf_hub_download(repo_id=REPO_ID, filename="model.keras") class_map_path = hf_hub_download(repo_id=REPO_ID, filename="class_map.json") model = tf.keras.models.load_model(model_path, compile=False) with open(class_map_path) as f: idx_to_label = json.load(f) def preprocess(img_path): img = image.load_img(img_path, target_size=(224, 224)) x = image.img_to_array(img) x = np.expand_dims(x, 0) x = x / 255.0 # or use tf.keras.applications.efficientnet.preprocess_input(x) return x x = preprocess("path/to/ct_slice.png") probs = model.predict(x, verbose=0)[0] for i, p in enumerate(probs): print(f"{idx_to_label[str(i)]}: {p:.3f}") print("Predicted:", idx_to_label[str(int(np.argmax(probs)))]) **Option B — Snapshot Download (Local Folder)** from huggingface_hub import snapshot_download local_dir = snapshot_download(repo_id="TODO:your-username/efficientnetb0-lung-ct-4class") # loads ./model.keras and ./class_map.json from local_dir ---