johannhartmann 's Collections Document & UI Intelligence
updated
8B • Updated • 52
• 9
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction
Paper
• 2412.04454
• Published • 71
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents
Paper
• 2401.10935
• Published • 5
Text Generation
• 10B • Updated • 1.54k
• 18
jadechoghari/Ferret-UI-Llama8b
Image-Text-to-Text
• Updated • 2.32k
• 68
Ferret-UI 2: Mastering Universal User Interface Understanding Across
Platforms
Paper
• 2410.18967
• Published • 1
Image-Text-to-Text
• Updated • 419
• 1.71k
InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning
and Reflection
Paper
• 2501.04575
• Published • 25
Updated • 3.68k
• 275
Image-Text-to-Text
• 0.3B • Updated • 957
• 99