ThinkMorph
/

ThinkMorph-7B

Any-to-Any

Safetensors

ThinkMorph-7B

Model card Files Files and versions

xet

Community

luckychao commited on Nov 3, 2025

Commit

ad1a4ab

verified ·

1 Parent(s): 754aa49

Update README.md

Browse files

Files changed (1) hide show

README.md +17 -20

README.md CHANGED Viewed

@@ -16,13 +16,13 @@ library_name: ThinkMorph-7B
 <p align="center">
-  <a href="">
     <img
       src="https://img.shields.io/badge/ThinkMorph-Website-0A66C2?logo=safari&logoColor=white"
       alt="ThinkMorph Website"
     />
   </a>
-  <a href="">
     <img
       src="https://img.shields.io/badge/ThinkMorph-Paper-red?logo=arxiv&logoColor=red"
       alt="ThinkMorph Paper on arXiv"
@@ -48,30 +48,19 @@ library_name: ThinkMorph-7B
   </a> -->
 </p>
-## 💥 News
-- **[2025.10.29]** Our model checkpoint and training data are now accessible at [Huggingface](https://huggingface.co/ThinkMorph).
-- **[2025.10.29]** Our paper is now accessible at .
 ## 👀 About ThinkMorph
-Multimodal reasoning demands synergistic coordination of language and vision. However, determining what constitutes meaningful interleaved reasoning is non-trivial, and current approaches lack a generalizable recipe.
-We present **ThinkMorph**, a unified model that enables such generalization through a principled approach: treating text and images as complementary modalities that mutually advance reasoning.
 <p align="center">
-    <img src="https://github.com/ThinkMorph/ThinkMorph/raw/main/assets/interleaved_design.jpg" width="100%"> <br>
-</p>
-Guided by this principle, we identify tasks requiring concrete, verifiable visual engagement and design a high-quality data pipeline that trains models to generate interleaved images and text as progressive reasoning traces.
-<p align="center">
-    <img src="https://github.com/ThinkMorph/ThinkMorph/raw/main/assets/thinkmorph_main.jpg" width="100%"> <br>
 </p>
-ThinkMorph delivers substantial gains on **vision-centric** tasks, achieving an average improvement of 34.74% over the base model while consistently surpassing text-only and image-only modes.
-By fine-tuning with **merely ~24K** samples, it achieves out-of-domain performance that rivals or even surpasses leading large-scale, proprietary VLMs.
-Intriguingly, ThinkMorph unlocks emergent properties that represent a *hallmark of multimodal intelligence*: the elicitation of unseen visual manipulation skills, the self-adaptive switching between reasoning modes according to task complexity, and better test-time scaling via diversified thoughts.
-<p align="center">
-    <img src="https://github.com/ThinkMorph/ThinkMorph/raw/main/assets/emrging_prop.jpg" width="100%"> <br>
-</p>
-These findings suggest promising directions for future work to characterize the emergent capabilities of unified models for multimodal reasoning.
 ## 📊 Benchmarks
@@ -95,5 +84,13 @@ These findings suggest promising directions for future work to characterize the
 ## ✍️ Citation
 ```bibtex
 ```

 <p align="center">
+  <a href="https://thinkmorph.github.io/">
     <img
       src="https://img.shields.io/badge/ThinkMorph-Website-0A66C2?logo=safari&logoColor=white"
       alt="ThinkMorph Website"
     />
   </a>
+  <a href="https://arxiv.org/abs/2510.27492">
     <img
       src="https://img.shields.io/badge/ThinkMorph-Paper-red?logo=arxiv&logoColor=red"
       alt="ThinkMorph Paper on arXiv"
   </a> -->
 </p>
 ## 👀 About ThinkMorph
 <p align="center">
+    <img src="https://github.com/ThinkMorph/ThinkMorph/raw/main/assets/thinkmorph.jpg" width="100%"> <br>
 </p>
+We present **ThinkMorph**, a unified model fine-tuned on ∼24K high-quality interleaved reasoning traces across tasks, learning to generate progressive text–image reasoning steps that
+concretely manipulate visual content while maintaining coherent verbal logic.
+Beyond strong vision-benchmark performance and robust out-of-domain generalization, ThinkMorph demonstrates emergent multimodal intelligence, including novel visual manipulation skills and so on.
+These findings suggest promising directions for characterizing the emergent capabilities of unified models for multimodal reasoning.
 ## 📊 Benchmarks
 ## ✍️ Citation
 ```bibtex
+@misc{gu2025thinkmorphemergentpropertiesmultimodal,
+      title={ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning},
+      author={Jiawei Gu and Yunzhuo Hao and Huichen Will Wang and Linjie Li and Michael Qizhe Shieh and Yejin Choi and Ranjay Krishna and Yu Cheng},
+      year={2025},
+      eprint={2510.27492},
+      archivePrefix={arXiv},
+      primaryClass={cs.CV},
+      url={https://arxiv.org/abs/2510.27492},
+}
 ```