2 5 3

Zheqi He

philokey

philokey

AI & ML interests

None yet

Recent Activity

liked a dataset about 3 hours ago

BAAI/Video-SafetyBench

upvoted a paper 9 days ago

GR-RL: Going Dexterous and Precise for Long-Horizon Robotic Manipulation

updated a dataset about 1 month ago

FlagEval/coco_val2014_sampled

View all activity

Organizations

liked a dataset about 3 hours ago

BAAI/Video-SafetyBench

Viewer • Updated May 21 • 2.26k • 104 • 5

upvoted a paper 9 days ago

GR-RL: Going Dexterous and Precise for Long-Horizon Robotic Manipulation

Paper • 2512.01801 • Published 10 days ago • 23

updated a dataset about 1 month ago

FlagEval/coco_val2014_sampled

Viewer • Updated Nov 6 • 1k • 89

upvoted a paper about 1 month ago

Do Vision-Language Models Measure Up? Benchmarking Visual Measurement Reading with MeasureBench

Paper • 2510.26865 • Published Oct 30 • 11

commented a paper about 1 month ago

Do Vision-Language Models Measure Up? Benchmarking Visual Measurement Reading with MeasureBench

Paper • 2510.26865 • Published Oct 30 • 11 •

authored a paper about 1 month ago

Do Vision-Language Models Measure Up? Benchmarking Visual Measurement Reading with MeasureBench

Paper • 2510.26865 • Published Oct 30 • 11

upvoted a paper about 1 month ago

Video-SafetyBench: A Benchmark for Safety Evaluation of Video LVLMs

Paper • 2505.11842 • Published May 17 • 2

updated a dataset about 1 month ago

FlagEval/MeasureBench

Viewer • Updated Nov 3 • 2.44k • 300 • 1

published a dataset about 1 month ago

FlagEval/MeasureBench

Viewer • Updated Nov 3 • 2.44k • 300 • 1

authored 4 papers 3 months ago

CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning

Paper • 2401.14011 • Published Jan 25, 2024 • 1

Video-SafetyBench: A Benchmark for Safety Evaluation of Video LVLMs

Paper • 2505.11842 • Published May 17 • 2

RoboBrain 2.0 Technical Report

Paper • 2507.02029 • Published Jul 2 • 33

FlagEval Findings Report: A Preliminary Evaluation of Large Reasoning Models on Automatically Verifiable Textual and Visual Questions

Paper • 2509.17177 • Published Sep 21 • 13

upvoted a paper 3 months ago

FlagEval Findings Report: A Preliminary Evaluation of Large Reasoning Models on Automatically Verifiable Textual and Visual Questions

Paper • 2509.17177 • Published Sep 21 • 13

updated 2 datasets 7 months ago

BAAI/Video-SafetyBench

Viewer • Updated May 21 • 2.26k • 104 • 5

BAAI/TRUE

Viewer • Updated May 16 • 1.96k • 1.03k

published 2 datasets 7 months ago

BAAI/TRUE

Viewer • Updated May 16 • 1.96k • 1.03k

BAAI/Video-SafetyBench

Viewer • Updated May 21 • 2.26k • 104 • 5

published an article about 1 year ago

Article

Letting Large Models Debate: The First Multilingual LLM Debate Competition

Nov 20, 2024

•

liked a Space about 1 year ago

FlagEval-Debate

🐠

Display a debate interface

Zheqi He

AI & ML interests

Recent Activity

Organizations

philokey's activity

Letting Large Models Debate: The First Multilingual LLM Debate Competition

FlagEval-Debate