Harpreet Sahota's picture

Harpreet Sahota PRO

harpreetsahota

·

AI & ML interests

Deep learning, laguage models, prompt engineering, agents, multi-agent systems

Recent Activity

liked a model about 15 hours ago

ibm-esa-geospatial/Llama3-MS-CLIP-base

updated a dataset about 17 hours ago

Voxel51/RegSegRS

liked a dataset about 17 hours ago

Voxel51/RegSegRS

View all activity

Organizations

upvoted 2 papers about 2 months ago

Robot Learning: A Tutorial

Paper • 2510.12403 • Published Oct 14 • 115

CommonForms: A Large, Diverse Dataset for Form Field Detection

Paper • 2509.16506 • Published Sep 20 • 19

upvoted a collection 2 months ago

ModernVBERT

Resources for ModernVBERT • 5 items • Updated Oct 3 • 11

upvoted a collection 3 months ago

Qwen3-VL

37 items • Updated Nov 1 • 502

upvoted an article 3 months ago

Article

Vision Language Model Alignment in TRL ⚡️

+3

Aug 7

•

101

upvoted a collection 3 months ago

Granite Docling

5 items • Updated 23 days ago • 59

upvoted an article 3 months ago

Article

PP-OCRv5 on Hugging Face: A Specialized Approach to OCR

Sep 10

•

108

upvoted a collection 3 months ago

PP-OCRv5

PP-OCRv5 is the latest text recognition solution, supporting Simplified Chinese, Chinese Pinyin, Traditional Chinese, English, and Japanese • 13 items • Updated Sep 15 • 50

upvoted 2 collections 4 months ago

UI-Venus

7 items • Updated Oct 13 • 22

Releases July 25

28 items • Updated Jul 30 • 3

upvoted a collection 5 months ago

Releases July 18

34 items • Updated Jul 23 • 4

upvoted an article 6 months ago

Article

Welcome the NVIDIA Llama Nemotron Nano VLM to Hugging Face Hub

Jun 27

•

29

upvoted a collection 6 months ago

V-JEPA 2

A frontier video understanding model developed by FAIR, Meta, which extends the pretraining objectives of https://ai.meta.com/blog/v-jepa-yann • 8 items • Updated Jun 13 • 173

upvoted an article 6 months ago

Article

ScreenSuite - The most comprehensive evaluation suite for GUI Agents!

Jun 6

•

55

upvoted 3 collections 6 months ago

Holo1

Vision-Language Action Model for use in Surfer-H web navigation agent • 6 items • Updated Jun 10 • 48

AGUVIS: Unified Pure Vision GUI Agents

https://aguvis-project.github.io • 3 items • Updated Dec 20, 2024 • 7

MiMo-VL

6 items • Updated 19 days ago • 38

upvoted a collection 7 months ago

MiniCPM-o & MiniCPM-V

Multimodal models with leading performance. • 28 items • Updated Sep 1 • 59

upvoted an article 7 months ago

Article

Vision Language Models (Better, faster, stronger)

+3

May 12

•

568

upvoted a collection 8 months ago

April 11 Releases

22 items • Updated Apr 16 • 7