luckychao commited on
Commit
ad1a4ab
·
verified ·
1 Parent(s): 754aa49

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -20
README.md CHANGED
@@ -16,13 +16,13 @@ library_name: ThinkMorph-7B
16
 
17
 
18
  <p align="center">
19
- <a href="">
20
  <img
21
  src="https://img.shields.io/badge/ThinkMorph-Website-0A66C2?logo=safari&logoColor=white"
22
  alt="ThinkMorph Website"
23
  />
24
  </a>
25
- <a href="">
26
  <img
27
  src="https://img.shields.io/badge/ThinkMorph-Paper-red?logo=arxiv&logoColor=red"
28
  alt="ThinkMorph Paper on arXiv"
@@ -48,30 +48,19 @@ library_name: ThinkMorph-7B
48
  </a> -->
49
  </p>
50
 
51
- ## 💥 News
52
- - **[2025.10.29]** Our model checkpoint and training data are now accessible at [Huggingface](https://huggingface.co/ThinkMorph).
53
- - **[2025.10.29]** Our paper is now accessible at .
54
 
55
  ## 👀 About ThinkMorph
56
 
57
- Multimodal reasoning demands synergistic coordination of language and vision. However, determining what constitutes meaningful interleaved reasoning is non-trivial, and current approaches lack a generalizable recipe.
58
- We present **ThinkMorph**, a unified model that enables such generalization through a principled approach: treating text and images as complementary modalities that mutually advance reasoning.
59
  <p align="center">
60
- <img src="https://github.com/ThinkMorph/ThinkMorph/raw/main/assets/interleaved_design.jpg" width="100%"> <br>
61
- </p>
62
- Guided by this principle, we identify tasks requiring concrete, verifiable visual engagement and design a high-quality data pipeline that trains models to generate interleaved images and text as progressive reasoning traces.
63
- <p align="center">
64
- <img src="https://github.com/ThinkMorph/ThinkMorph/raw/main/assets/thinkmorph_main.jpg" width="100%"> <br>
65
  </p>
66
 
67
- ThinkMorph delivers substantial gains on **vision-centric** tasks, achieving an average improvement of 34.74% over the base model while consistently surpassing text-only and image-only modes.
68
- By fine-tuning with **merely ~24K** samples, it achieves out-of-domain performance that rivals or even surpasses leading large-scale, proprietary VLMs.
 
 
 
69
 
70
- Intriguingly, ThinkMorph unlocks emergent properties that represent a *hallmark of multimodal intelligence*: the elicitation of unseen visual manipulation skills, the self-adaptive switching between reasoning modes according to task complexity, and better test-time scaling via diversified thoughts.
71
- <p align="center">
72
- <img src="https://github.com/ThinkMorph/ThinkMorph/raw/main/assets/emrging_prop.jpg" width="100%"> <br>
73
- </p>
74
- These findings suggest promising directions for future work to characterize the emergent capabilities of unified models for multimodal reasoning.
75
 
76
 
77
  ## 📊 Benchmarks
@@ -95,5 +84,13 @@ These findings suggest promising directions for future work to characterize the
95
  ## ✍️ Citation
96
 
97
  ```bibtex
98
-
 
 
 
 
 
 
 
 
99
  ```
 
16
 
17
 
18
  <p align="center">
19
+ <a href="https://thinkmorph.github.io/">
20
  <img
21
  src="https://img.shields.io/badge/ThinkMorph-Website-0A66C2?logo=safari&logoColor=white"
22
  alt="ThinkMorph Website"
23
  />
24
  </a>
25
+ <a href="https://arxiv.org/abs/2510.27492">
26
  <img
27
  src="https://img.shields.io/badge/ThinkMorph-Paper-red?logo=arxiv&logoColor=red"
28
  alt="ThinkMorph Paper on arXiv"
 
48
  </a> -->
49
  </p>
50
 
 
 
 
51
 
52
  ## 👀 About ThinkMorph
53
 
 
 
54
  <p align="center">
55
+ <img src="https://github.com/ThinkMorph/ThinkMorph/raw/main/assets/thinkmorph.jpg" width="100%"> <br>
 
 
 
 
56
  </p>
57
 
58
+ We present **ThinkMorph**, a unified model fine-tuned on ∼24K high-quality interleaved reasoning traces across tasks, learning to generate progressive text–image reasoning steps that
59
+ concretely manipulate visual content while maintaining coherent verbal logic.
60
+
61
+ Beyond strong vision-benchmark performance and robust out-of-domain generalization, ThinkMorph demonstrates emergent multimodal intelligence, including novel visual manipulation skills and so on.
62
+ These findings suggest promising directions for characterizing the emergent capabilities of unified models for multimodal reasoning.
63
 
 
 
 
 
 
64
 
65
 
66
  ## 📊 Benchmarks
 
84
  ## ✍️ Citation
85
 
86
  ```bibtex
87
+ @misc{gu2025thinkmorphemergentpropertiesmultimodal,
88
+ title={ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning},
89
+ author={Jiawei Gu and Yunzhuo Hao and Huichen Will Wang and Linjie Li and Michael Qizhe Shieh and Yejin Choi and Ranjay Krishna and Yu Cheng},
90
+ year={2025},
91
+ eprint={2510.27492},
92
+ archivePrefix={arXiv},
93
+ primaryClass={cs.CV},
94
+ url={https://arxiv.org/abs/2510.27492},
95
+ }
96
  ```