Add link to paper, correct dataset name, change library_name

This PR improves the model card by linking it to the paper, corrects the dataset name, and updates the `library_name` to "transformers".

Files changed (1) hide show

README.md +19 -18

README.md CHANGED Viewed

@@ -1,6 +1,12 @@
 ---
-library_name: peft
 base_model: lmms-lab/llava-onevision-qwen2-0.5b-ov
 tags:
 - vision-language
 - multimodal
@@ -11,12 +17,6 @@ tags:
 - photography
 - scene-analysis
 - image-captioning
-license: apache-2.0
-datasets:
-- Dataseeds/DataSeeds-Sample-Dataset-DSD
-language:
-- en
-pipeline_tag: image-text-to-text
 model-index:
 - name: LLaVA-OneVision-Qwen2-0.5b-ov-DSD-FineTune
   results:
@@ -24,26 +24,26 @@ model-index:
       type: image-captioning
       name: Image Captioning
     dataset:
-      type: Dataseeds/DataSeeds-Sample-Dataset-DSD
       name: DataSeeds.AI Sample Dataset
     metrics:
     - type: bleu-4
       value: 0.0246
       name: BLEU-4
     - type: rouge-l
-      value: 0.2140
       name: ROUGE-L
     - type: bertscore
       value: 0.2789
       name: BERTScore F1
     - type: clipscore
-      value: 0.3260
       name: CLIPScore
 ---
 # LLaVA-OneVision-Qwen2-0.5b Fine-tuned on DataSeeds.AI Dataset
-This model is a LoRA (Low-Rank Adaptation) fine-tuned version of [lmms-lab/llava-onevision-qwen2-0.5b-ov](https://huggingface.co/lmms-lab/llava-onevision-qwen2-0.5b-ov) specialized for photography scene analysis and description generation. The model was fine-tuned on the [DataSeeds Sample Dataset (DSD)](https://huggingface.co/datasets/Dataseeds/DataSeeds-Sample-Dataset-DSD) to enhance its capabilities in generating detailed, accurate descriptions of photographic content.
 ## Model Description
@@ -67,7 +67,7 @@ This model is a LoRA (Low-Rank Adaptation) fine-tuned version of [lmms-lab/llava
 ## Training Details
 ### Dataset
-The model was fine-tuned on the GuruShots Sample Dataset, a curated collection of photography images with detailed scene descriptions focusing on:
 - Compositional elements and camera perspectives
 - Lighting conditions and visual ambiance
 - Product identification and technical details
@@ -187,7 +187,8 @@ for prompt in prompts:
     outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.7)
     description = processor.decode(outputs[0], skip_special_tokens=True)
     print(f"Prompt: {prompt}")
-    print(f"Description: {description}\n")
 ```
 ## Model Architecture
@@ -212,7 +213,7 @@ The model maintains the LLaVA-OneVision architecture with the following componen
 ## Training Data
-The GuruShots Sample Dataset contains curated photography images with comprehensive annotations including:
 - **Scene Descriptions**: Detailed textual descriptions of visual content
 - **Technical Metadata**: Camera settings, composition details
@@ -230,7 +231,7 @@ The dataset focuses on enhancing the model's ability to:
 ### Model Limitations
 - **Domain Specialization**: Optimized for photography; may have reduced performance on general vision-language tasks
 - **Base Model Inheritance**: Inherits limitations from LLaVA-OneVision base model
-- **Training Data Bias**: May reflect biases present in the GuruShots dataset
 - **Language Support**: Primarily trained and evaluated on English descriptions
 ### Recommended Use Cases
@@ -252,7 +253,7 @@ If you use this model in your research or applications, please cite:
 ```bibtex
 @article{abdoli2025peerranked,
-    title={Peer-Ranked Precision: Creating a Foundational Dataset for Fine-Tuning Vision Models from GuruShots' Annotated Imagery},
     author={Sajjad Abdoli and Freeman Lewin and Gediminas Vasiliauskas and Fabian Schonholz},
     journal={arXiv preprint arXiv:2506.05673},
     year={2025},
@@ -290,9 +291,9 @@ This model is released under the Apache 2.0 license, consistent with the base LL
 - **Base Model**: Thanks to LMMS Lab for the LLaVA-OneVision model
 - **Vision Encoder**: Thanks to Google Research for the SigLIP model
-- **Dataset**: GuruShots photography community for the source imagery
 - **Framework**: Hugging Face PEFT library for efficient fine-tuning capabilities
 ---
-*For questions, issues, or collaboration opportunities, please visit the [model repository](https://huggingface.co/Dataseeds/LLaVA-OneVision-Qwen2-0.5b-ov-DSD-FineTune) or contact the DataSeeds.AI team.*

 ---
 base_model: lmms-lab/llava-onevision-qwen2-0.5b-ov
+datasets:
+- Dataseeds/DataSeeds-Sample-Dataset-DSD
+language:
+- en
+library_name: transformers
+license: apache-2.0
+pipeline_tag: image-text-to-text
 tags:
 - vision-language
 - multimodal
 - photography
 - scene-analysis
 - image-captioning
 model-index:
 - name: LLaVA-OneVision-Qwen2-0.5b-ov-DSD-FineTune
   results:
       type: image-captioning
       name: Image Captioning
     dataset:
       name: DataSeeds.AI Sample Dataset
+      type: Dataseeds/DataSeeds-Sample-Dataset-DSD
     metrics:
     - type: bleu-4
       value: 0.0246
       name: BLEU-4
     - type: rouge-l
+      value: 0.214
       name: ROUGE-L
     - type: bertscore
       value: 0.2789
       name: BERTScore F1
     - type: clipscore
+      value: 0.326
       name: CLIPScore
 ---
 # LLaVA-OneVision-Qwen2-0.5b Fine-tuned on DataSeeds.AI Dataset
+This model is a LoRA (Low-Rank Adaptation) fine-tuned version of [lmms-lab/llava-onevision-qwen2-0.5b-ov](https://huggingface.co/lmms-lab/llava-onevision-qwen2-0.5b-ov) specialized for photography scene analysis and description generation. The model was presented in the paper [Peer-Ranked Precision: Creating a Foundational Dataset for Fine-Tuning Vision Models from DataSeeds' Annotated Imagery](https://huggingface.co/papers/2506.05673). The model was fine-tuned on the [DataSeeds Sample Dataset (DSD)](https://huggingface.co/datasets/Dataseeds/DataSeeds-Sample-Dataset-DSD) to enhance its capabilities in generating detailed, accurate descriptions of photographic content.
 ## Model Description
 ## Training Details
 ### Dataset
+The model was fine-tuned on the DataSeeds Sample Dataset, a curated collection of photography images with detailed scene descriptions focusing on:
 - Compositional elements and camera perspectives
 - Lighting conditions and visual ambiance
 - Product identification and technical details
     outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.7)
     description = processor.decode(outputs[0], skip_special_tokens=True)
     print(f"Prompt: {prompt}")
+    print(f"Description: {description}
+")
 ```
 ## Model Architecture
 ## Training Data
+The DataSeeds Sample Dataset contains curated photography images with comprehensive annotations including:
 - **Scene Descriptions**: Detailed textual descriptions of visual content
 - **Technical Metadata**: Camera settings, composition details
 ### Model Limitations
 - **Domain Specialization**: Optimized for photography; may have reduced performance on general vision-language tasks
 - **Base Model Inheritance**: Inherits limitations from LLaVA-OneVision base model
+- **Training Data Bias**: May reflect biases present in the DataSeeds dataset
 - **Language Support**: Primarily trained and evaluated on English descriptions
 ### Recommended Use Cases
 ```bibtex
 @article{abdoli2025peerranked,
+    title={Peer-Ranked Precision: Creating a Foundational Dataset for Fine-Tuning Vision Models from DataSeeds' Annotated Imagery},
     author={Sajjad Abdoli and Freeman Lewin and Gediminas Vasiliauskas and Fabian Schonholz},
     journal={arXiv preprint arXiv:2506.05673},
     year={2025},
 - **Base Model**: Thanks to LMMS Lab for the LLaVA-OneVision model
 - **Vision Encoder**: Thanks to Google Research for the SigLIP model
+- **Dataset**: DataSeeds photography community for the source imagery
 - **Framework**: Hugging Face PEFT library for efficient fine-tuning capabilities
 ---
+*For questions, issues, or collaboration opportunities, please visit the [model repository](https://huggingface.co/Dataseeds/LLaVA-OneVision-Qwen2-0.5b-ov-DSD-FineTune) or contact the DataSeeds.AI team.*